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Chapter  1 

Introduction 


Speed  of  design  is  essential  in  building  large-scale,  highly-complex  systems.  This 
issue  becomes  more  apparent,  since  emerging  VLSI  technologies  lead  to  systems  of 
increasing  si,je  and  coiiiple.xity.  Design  automation  accelerates  the  design  process  by 
providing  tools  that  improve  the  quality  of  a  quickly  designed  circuit.  Retiming,  which 
Woo  introduced  in  [13,  14,  ioj  and  treated  in  [17],  is  a  well-known  design  automation 
technique  which  aims  at  speeding  the  design  process,  without  sacrificing  the  quality 
of  the  implementation.  Retiming  optimizes  clocked  circuits  by  relocating  registers  so 
as  to  reduce  combinational  rippling.  In  this  thesis  we  further  investigate  retiming  and 
provide  results  of  practical  as  well  as  theoretical  interest.  We  present  optimal  algo¬ 
rithms  for  optimization  uf  combinational  circuitry.  We  give  a  novel  characterization  of 
the  minimum  clock-period  of  a  circuit  in  terms  of  the  maximum  register-to-delay  ratio 
cycle  in  the  circuit,  which  loads  to  improved  algorithms  for  minimum  clock-period 
and  appro.ximately  minimum  clock-jmriod  retiming.  W'e  exhibit  the  group  theoretical 
structure  of  retiming  on  circuits  with  unit-delay  components.  Finally,  we  give  an  ef¬ 
ficient  algorithm  for  a  mixed-integer  optimization  problem,  which  arises  in  the  linear 
programming  framework  of  retiming. 

In  Chapter  2  we  introduce  the  basic  concepts  of  retiming.  We  define  the  notations 
and  terminology  and  review  the  grajrh-theoretic  model  of  digital  circuits  from  [1.5,  17]. 
We  give  an  algorithm  that  transforms  a  given  combinational  circuit  into  a  functionally 
equivalent  pipelined  circuit  with  minimum  latency  and  clock-period  no  greater  than 
a  given  upper  bound  c.  The  algorithm  runs  in  0(E)  steps,  where  E  is  the  number  of 
interconnections  in  the  cirruit .  and  is  optimal  within  a  constant  factor.  The  operation 
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CHAPTER  1.  INTRODUCTION 


of  the  algorithm  is  based  on  the  notion  of  accumulated  delay  along  a  path  in  the  circuit. 

In  Chapter  3  we  give  a  novel  and  concise  characterization  of  the  minimum  feasible 
clock-period  of  a  circuit  in  terms  of  the  maximum  delay-to-register  ratio  cycle  in  the 
circuit  graph.  We  prove  that  this  ratio  does  not  exceed  the  minimum  feasible  clock- 
period  by  more  than  an  additive  factor  of  D,  where  D  is  the  maximum  delay  of  the 
processing  elements  in  the  circuit.  This  observation  establishes  a  range  of  possible 
values  for  the  minimum  clock-period,  that  is  independent  of  the  size  of  the  circuit. 
The  range  depends  solely  on  the  delays  of  the  individual  components  used. 

Based  on  the  maximum  ratio  cycle  characterization  of  the  minimum  clock-period 
we  approach  a  variety  of  retiming  problems.  For  combinational  circuits  we  give  an 
optimal  0{E)  algorithm,  that  transforms  a  unit-delay  combinational  circuit  into  a 
pipelined  circuit  with  minimum  clock-period  and  latency  no  greater  than  a  given  upper 
bound  1.  We  also  give  a  more  general  0{E\gD)  algorithm  for  the  same  problem  on 
combinational  circuits  with  arbitrary  delays.  We  show  how  to  obtain  a  minimum 
clock-period  retiming  of  a  unit-delay  circuit  in  E\%{VW),V E))  steps, 

where  V  is  the  number  of  processing  elements  in  the  circuit  and  W  is  the  maximum 
number  of  registers  on  a  wire  in  the  circuit,  by  direct  application  of  graph- theoretic 
algorithms  for  finding  the  minimum  cycle  mean  in  a  graph  [11,  20].  We  demonstrate 
how  to  obtain  a  minimum  clock-period  retiming  of  a  circuit  with  arbitrary  delays 
in  0{y E\gD)  steps.  The  best  previously  known  strongly  polynomial  algorithm  for 
minimum  clock-period  retiming  of  synchronous  circuitry,  unit-delay  or  arbitrary-delay, 
required  O(VElgV)  steps  [17],  Finally,  if  the  retimed  circuit  is  allowed  a  clock-period 
w'hich  does  not  e.xceed  the  minimum  possible  by  more  than  D  we  show  how  to  obtain 
it  in  E\g{VW)\g{V D),V E\g(V D)})  steps.  The  running  times  of  the 

algorithms  in  Chapters  2  and  3  are  summarized  in  the  table  of  Figure  1.1. 

In  Chapter  4  we  inve.stigate  group-theoretic  properties  of  retiming.  We  demon¬ 
strate  the  closed  semiring  structure  of  retiming  on  unit-delay  circuits  and  we  give  a 
Bellman-Ford  type  algorithm,  with  redefined  additive  and  multiplicative  operations, 
for  unit-delay  circuitry  retiming.  Its  running  time  is  0(V E)  and  matches  the  best 
previously  known  strongly  polynomial  algorithm  for  the  same  problem  [17]. 

In  Chapter  .5  we  investigate  a  mixed-integer  optimization  problem,  which  arises  in 
the  linear  programming  framework  of  retiming.  We  give  a  polynomial  time  algorithm 
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Circuit  Type 

Transformation 

Running  Time 

Combinational 

Min  Latency  Pipelining 

OiE) 

UD  Combinational 

Min  Clock- Period 

OiE) 

Combinational 

Min  Clock- Period 

0{E\gD) 

UD  Sequential 

Min  Clock- Period 

Sequential 

Min  Clock-Period 

OiVElgD) 

Sequential 

.Approx  Min  Clock-Period 

oUA  V/^£ig(vir)ig(FD)  n 

Figure  1.1:  Summary  of  problems  and  running  times  of  corresponding  algorithms.  For 
the  sake  of  simplicity  we  denote  a  set  S  and  its  cardinality  |5|  by  the  same  symbol. 
The  initials  UD  denote  unit-delay  circuitry. 


for  a  generic  mi.xed-intoger  optimization  problem,  that  we  call  restricted  mixed-integer 

dual  of  an  uncapacitated  minimum-cost  flow.  The  polynomial  running  time  is  achieved 

by  introducing  a  set  of  additional,  appropriately  chosen  constraints.  The  same  idea 

was  used  for  the  solution  of  a  similar  mi.xed-integer  problem  in  [22,  16],  which  did 

not  involve,  however,  an  objective  to  be  optimized.  The  technique  of  introducing 

additional  constraints,  or  cuts  as  they  are  known  in  the  literature,  in  order  to  solve 

mixed-integer  optimization  problems,  is  known  in  general  to  require  an  exponential 

number  of  steps  [21,  23,  3,  18].  Aharoni,  Erdos  and  Linial  [1]  pose  the  question  ^ 

w'hether  a  clever  choice  of  cuts  can  yield  polynomial  time  algorithms.  We  show  that  ^ 

this  is  possible  for  the  problem  we  consider,  by  choosing  the  cuts  in  a  way  that  reduces  I 

the  original  mixed-integer  problem  to  a  network  flow  problem.  I 


Chapter  2 


Minimum  Latency  Pipelining 


In  this  chapter  we  review  the  basic  concepts  of  retiming  and  describe  an  0{E)  algo¬ 
rithm  for  minimum  latency  pipelining  of  combinational  circuitry.  The  running  time 
of  the  algorithm  is  optimal  within  a  constant  factor.  The  chapter  is  organized  as  fol¬ 
lows.  Section  2.1  defines  the  terminology  used  in  the  rest  of  the  paper  and  presents 
a  mathematical  framework  of  retiming.  Section  2.2  gives  a  brief  overview  of  the  re¬ 
lation  between  the  problem  of  satisfying  a  given  set  of  difference  constraints  and  the 
problem  of  finding  single-source  shortest-paths  in  a  graph.  This  relation  serves  as  a 
basis  for  proving  the  correctness  of  our  algorithm  for  minimum  latency  pipelining  of 
combinational  circuitry.  Both  the  algorithm  and  its  correctness  proof  are  given  in 
Section  2.3. 


2.1  Preliminaries 

In  this  section  we  define  the  notations  and  terminology  needed  in  the  rest  of  the  paper 
and  present  the  graph-theoretic  model  of  digital  circuits  assumed.  We  also  describe 
the  operation  of  retiming  and  present  a  mathematical  framework  for  it.  The  entire 
framework  presented  in  this  section  was  introduced  in  [13,  14,  15]  and  was  treated 
thoroughly  in  [17]. 

We  view  a  circuit  abstractly  as  a  network  of  functional  elements  and  globally 
clocked  registers.  The  functional  elements  provide  the  computational  power  of  the 
circuit  and  the  registers  act  as  storage  elements.  Each  functional  element  has  an 
associated  propagation  delay.  The  outputs  of  a  functional  element  at  any  time  are 
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defined  as  a  specified  function  of  its  inputs,  provided  that  all  the  inputs  have  been 
stable  for  a  time  at  least  ecjual  to  the  element’s  propagation  delay. 

We  model  a  circuit  as  a  finite,  vertex- weighted,  edge-weighted,  directed  multigraph 
G  =  {V,  E,  d,  w).  The  vertices  of  the  graph  model  the  functional  elements  of  the  circuit. 
Each  vertex  v  is  weighted  with  its  numerical  propagation  delay  d{v).  The  directed 
edges  E  of  the  graph  model  interconnections  between  functional  elements.  Each  edge 
u  V  £  E  connects  an  output  of  some  functional  element  represented  by  vertex  u  to 
an  input  of  some  functional  element  represented  by  vertex  v.  Each  edge  e  is  labeled 
with  a  register  count  w(e),  which  equals  the  number  of  registers  along  the  connection. 
We  impose  the  restriction  that  there  be  no  directed  cycles  in  G  of  zero  edge-weight, 
thereby  ensuring  that  no  race  conditions  can  arise.  We  define  the  clock-period  ^(G) 
for  any  synchronous  circuit  G  as  the  maximum  amount  of  propagation  delay  through 
which  any  signal  must  ripple  between  clock  ticks. 

We  shall  view  a  simple  path  /;  =  u  u  in  G  as  a  sequence  of  vertices  and  edges, 
with  no  repetitions,  that  starts  from  a  vertex  u  and  ends  at  a  vertex  v.  For  any  path 
p  =  uq  3.  vj  —  . . .  we  define  the  path  weight  as  the  sum  of  the  weights  of  the 

edges  of  the  path: 

k-l 

^’(p)  =  HMe,). 

1=0 

We  also  define  the  path  delay  as  the  sum  of  the  delays  of  the  vertices  of  the  path: 

Jt 

i=0 

In  order  that  a  graph  G  =  (V'.  E.d,w)  have  well-defined  physical  meaning  as  a 
circuit,  we  place  the  restriction  that  the  propagation  delays  d{v)  and  the  register 
counts  w{e)  are  nonnegative  integers  for  each  vertex  v  £  V  and  for  each  edge  e  £  E. 

Retiming  transformations  alter  the  clock-period  of  a  circuit  by  inserting  and  delet¬ 
ing  registers,  but  without  otherwise  affecting  the  circuit’s  structure.  The  new  circuit  is 
functionally  equivalent,  as  seen  by  the  external  world,  to  the  original.  Such  a  proof  can 
be  found  in  [15],  which  also  contains  a  technical  definition  of  the  term  “equivalent”. 
A  retiming  oi  a  circuit  G  =  [V,  E  ,d,w)  is  an  integer-valued  vertex-labeling  r  :  V  — ► 
Z.  The  retiming  specifies  a  transformation  of  the  original  circuit  in  which  registers  are 
added  and  removed  so  as  to  change  the  graph  G  into  a  new  graph  Gr  =  {V,  E,d,Wr). 
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CHAPTER  2.  MINIMUM  LATENCY  PIPELINING 


The  edge-weighting  tUr  is  defined  for  an  edge  u  n  by  the  equation 

Wr{e)  =  iv(e)  +  r(u)  -  r(v), 

and  the  label  r(v)  is  referred  to  as  the  lead  of  vertex  v.  A  retiming  r  of  a  circuit  is 
legal  if  the  register  counts  Wr  of  the  retimed  circuit  Gr  are  nonnegative,  thus  ensuring 
that  no  edge  may  have  a  negative  register  count. 

In  order  to  characterize  the  clock-period  of  a  retimed  circuit  we  define  two  quan¬ 
tities: 


lT(u,  v)  =  inin{tr(p)  :  u}, 

D{u,  v)  -  inax{d(p)  :  u u  and  ii;(p)  =  iy(u,  u)}. 

The  quantity  IT(u,  v)  is  the  minimum  number  of  registers  on  any  path  from  vertex  u 
to  vertex  v.  We  call  a  path  u v  such  that  w{p)  =  W{u,v)  a  critical  path  from  u  to 
V  and  we  denote  it  by  u  v.  The  quantity  D{u,  v)  is  the  maximum  total  propagation 
delay  on  any  critical  path  from  u  to  v. 

The  following  two  statements  about  D  are  important: 

FI  D{u,v)  can  take  on  0{V^)  values. 

F2  Given  a  synchronous  circuit  G  and  a  retiming  r  of  G,  the  clock-period  ^(Gr) 
is  equal  to  D(u,  e)  for  some  u,  u  £  V. 

Statements  FI  and  F2  are  easily  justified  by  the  fact  that  there  are  0{V^)  pairs  of 
vertices  in  the  graph  and  that  retiming  does  not  change  the  propagation  delay  along 
a  critical  path  between  any  two  vertices  in  the  graph. 

We  can  compute  W  and  I)  by  solving  an  all-pairs  shortest-paths  problem  in  G. 
Common  ways  of  solving  this  problem  are  the  Floyd- Warshall  method  [12,  page  86], 
which  runs  in  0(V^)  and  Johnson’s  algorithm  [10],  which  runs  in  0{VE-\-  V^lgV) 
time  using  the  Fibonacci  heap  data  structure  due  to  Fredman  and  Tarjan  [6]. 

The  following  theorem,  which  is  proven  in  [17],  characterizes  the  conditions  under 
which  a  retiming  produces  a  circuit  whose  clock-period  is  no  greater  than  a  given 


constant. 
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Theorem  2.1  Let  G  —  {V,  E ,d,w)  be  a  synchronous  circuit,  let  c  be  an  arbitrary 
positive  real  number,  and  let  r  be  a  function  from  V  to  the  integers.  Then  r  is  a  legal 
retiming  of  G  such  that  <t(Gr)  <  ^  i/  and  only  if 

r{v)  -  r{u)  <  w(e)  (2-1) 


for  every  edge  u  ^  v  of  G,  and 

/■(u)  -  r(u)  <  lF(u,  u)  —  1  (2.2) 

for  all  vertices  u,  v  6  1  such  Ihul  D{u,  i’)  >  c.  □ 

This  theorem  provides  the  basic  tool  needed  to  solve  the  retiming  problem  for  a 
given  clock-period.  Notice  that  the  constraints  on  the  unknowns  r(i;)  in  the  theorem 
are  linear  inequalities  involving  only  differences  of  unknowns.  Using  the  Bellman-Ford 
algorithm  [12,  page  74]  we  can  test  whether  there  exists  a  retimed  circuit  with  clock- 
period  less  than  some  constant  c  in  0{V^)  steps,  since  there  can  be  0{V^)  inequalities 
of  the  form  (2.1).  Leiserson  and  Saxe  [17]  give  an  asymptotically  faster  algorithm, 
which  runs  in  0{V E)  steps. 

2.2  Difference  Constraints  and  Shortest-Paths 

In  this  section  we  exhibit  the  relation  between  the  problem  of  satisfying  a  given  set  of 
difference  constraints  and  the  problem  of  finding  single-source  shortest-pathsin  a  graph 
generated  by  the  given  set  of  constraints.  We  also  give  without  proof  an  important 
property  of  the  single-source  sliortcst-paths  solution  [12,  4].  The  framework,  that  we 
develop  in  this  section,  will  be  used  extensively  in  the  rest  of  this  thesis. 

We  consider  the  problem  of  .solving  the  following  system  of  difference  constraints. 

Problem  DC  (Difference  Constraints)  Let  5  be  a  set  of  m  linear  constraints  of  the 
form 

Xj  -  .r,  <  n.j  (S) 

on  the  n  unknowns  -  where  a,j  are  given  real  constants.  Determine  a  set 

of  feasible  values  for  the  unknowns  x,  or  determine  that  no  such  set  exists.  □ 
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The  given  system  S  induces  an  edge- weighted  graph  G  =  {V,E,  w).  The  vertex  set 
V  is  defined  as 

V’  =  {u  :  Xy  is  an  unknown  of  5}. 

The  edge  set  E  is  defined  as 

E  =  {u  V  :  Xy  —  Xu  <  ayy  is  a  constraint  of  5}. 

Finally,  for  every  edge  u  v  £  E  we  have 


w(e)  =  a 


UV 


Now,  we  define  the  single-source  shortest-paths  problem  on  an  edge-weighted  graph 
G  =  {V,  E,  w)  from  a  source- vertex  .s  €  1’. 

Problem  SSSP  (Single-Source  Shortest- Paths)  Let  G  =  {V,  E,'w)  be  an  edge-weighted 
graph  and  let  s  be  a  vertex  in  V'.  Determine  a  value  l{v)  for  each  vertex  v  £  V  such 
that 

/( e)  =  niin{tt;(p)  :  □ 

VVe  give  without  proof  three  important  lemmata  [12]. 

Lemma  2.2  Problem  DC  is  solvable  if  and  only  if  Problem  SSSP  is  solvable.  □ 

Lemma  2.3  Problem  SSSP  is  solvable  if  and  only  if  there  exist  no  directed  cycles  C 
in  G  with  weight  w(C)  <  Q.  □ 

Lemma  2.4  Let  S  be  a  system  of  m  difference  constraints  of  the  form 

X j  X I  ^ 

on  the  n  unknowns  xi ,  xj, . . . ,  x,, ,  wlu  re  a,j  are  given  real  constants.  Let  G  =  (V,  E  ,w) 
be  the  graph  induced  by  S,  and  let  l(v)  be  the  length  of  the  shortest  path  in  G  from 
the  source  s  £  V  to  vertex  v.  Then  the  assignment  Xy  =  l{v)  for  each  vertex  v  £  V 
satisfies  the  constraints  in  S  and  maximizes  x„  —  Xj  for  every  vertex  v  £  V .  □ 

These  three  lemmata  will  be  used  extensively  in  the  correctness  proofs  of  the 
algorithms  that  we  present  in  the  rest  of  the  thesis. 
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2.3  The  Algorithm 

This  section  introduces  the  problem  of  minimum  latency  pipelining  of  combinational 
circuitry  and  presents  an  efficient  algorithm  for  its  solution.  The  algorithm  terminates 
in  0(E)  steps,  and  its  performance  is  optimal  within  a  constant  factor.  Its  running 
time  is  a  significant  improvement  over  the  0(V E)  running  time  of  the  previously 
known  techniques  for  the  general  retiming  problem. 

In  a  combinational  circuit  all  register  counts  are  zero  and  thus  the  circuit  graph  is 
acyclic.  We  consider  the  circuit  to  have  one  input  interface  u/  and  one  output  interfjM;e 
vq-  By  retiming  a  combinational  circuit  (S',  we  can  produce  a  pipe/ined circuit  Gr  which 
achieves  a  shorter  clock-period  at  the  cost  of  introducing  a  latency  of  r(v/)  —  r(vo) 
clock  ticks  for  signals  to  propagate  from  the  input  interface  vj  to  the  output  interface 
VQ. 

The  problem  of  minimum  latency  pipelining  is  defined  as  follows:  Given  a  combi¬ 
national  circuit  G  =  (V',  £’,(/,  0)  with  input  interface  vj  and  output  interface  vo,  and 
a  positive  integer  c,  find  a  legal  retiming  r  of  G  such  that  ^(Gr)  <  c  and  the  latency 
r{vi)  —  r(vo)  of  the  retimed  circuit  is  as  small  as  possible.  Stated  in  mathematical 
terms,  we  want  to  solve  the  following  problem: 

Problem  MLP  (Minimum  Latency  Pipelining)  Given  a  combinational  circuit 
G  =  (V,  E,d,0)  with  input  interface  tq  and  output  interface  vq,  determine  a  value 
r(v)  for  each  verte.x  v  £  V  that  minimizes  r(vi)  —  r(vo)  subject  to 

r((;)-r(«)<0  (2.3) 

for  every  edge  u  v  £  E ,  and 

r(e)- r(u)  <  -1  (2.4) 

for  all  vertices  u,  u  £  V  such  tliat  D(  u.  v)  >  c.  □ 

According  to  Section  2.2,  Problem  .\ILP  can  be  viewed  as  a  single-source  shortest-paths 
problem  on  the  constraint  graph  Gc  =  {V'l-,  Ec,iVc),  which  is  defined  in  the  following 
manner. 


=  v, 


K 
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Ec  =  {a  V  :  r{v)  -  r(u)  is  constrained  by  (2.3)  or  (2.4)}, 

_  j  0  if  ?■((;)  -  r{u)  is  constrained  by  (2.3), 

I  _1  if  )'(i>)  -  r((x)  is  constrained  by  (2.3). 

A  feasible  assignment  of  values  to  the  unknowns  of  Problem  MLP  can  be  obtained  in 
0{VE)  steps  by  using  the  general  techniques  described  in  Section  2.1. 

We  present  Algorithm  MLP,  which  yields  a  solution  to  Problem  MLP  la  0{E) 
steps.  The  running  time  of  the  algorithm  is  optimal  within  a  constant  factor.  For  each 
vertex  v  in  the  graph,  .Algoritiim  .MLP  maintains  its  stage  |r(i;)|  and  its  accumulated 
delay  S(v).  The  stage  of  a  vertex  v  is  the  number  of  registers  along  any  path  from  the 
input  interface  I'l  to  the  vertex  r.  The  accumulated  delay  of  a  vertex  v  is  the  longest 
delay  of  a  signal  coming  into  tiiat  vertex  from  a  preceding  register.  The  algorithm 
operates  as  follows: 

Algorithm  MLP  (Miniinuni  Latency  Pipelining)  Given  a  combinational  circuit  G 
and  a  desired  clock- period  c,  this  algorithm  determines  a  pipelined  combinational 
circuit  Gr  with  clock-period  <l»(G'r)  <  c  and  minimum  latency. 

1.  For  each  vertex  u  €  V'.  set  r(v)  —  0  and  6{v)  *—  d{v). 

2.  Visit  the  edges  u  —  e  in  toi)o!ogical  sort  order.  For  each  edge  u  —>  v  do: 

2.1.  If  r(i;)  >  r(u),  then  r(v)  —  r(u). 

2.2.  If  6(u)  +  d{v)  >  c  and  r(  v)  >  r(u),  then  r{v)  <—  r(u)  -  1. 

2.3.  If  6(u)  -I-  d{  v)  >  <^(  c)  and  r{u)  =  r{v),  then  d{v)  6{u)  -f  d{v). 

3.  For  each  edge  u  c  €  E.  set  av(c)  =  iv{e)  +  r(u)  —  r(v).  □ 

The  idea  behind  .Algorithm  .MLP  is  to  visit  the  vertices  of  the  graph  keeping  track 
of  the  longest  propagation  delay  up  to  the  vertex  currently  visited.  New  registers 
are  introduced  according  to  a  greedy  criterion:  whenever  the  longest  propagation 
delay  exceeds  the  desired  clock-period  c,  a  pipeline  stage  is  introduced.  Visiting  the 
edges  in  topological  sort  order  ensures  that  whenever  an  edge  is  considered  all  the 
preceding  edges  in  the  graph  have  been  taken  into  account.  Step  2.1  of  the  algorithm 
ensures  that  no  succeeding  vertex  belongs  to  a  higher  pipeline  stage.  Step  2.2  ensures 
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that  whenever  the  longest  propagation  delay  along  a  register-free  path  leading  from  a 
preceding  vertex  to  the  currently  visited  vertex  exceeds  the  desired  clock-period  c,  a 
new  pipeline  stage  is  introduced.  Finally,  step  2.3  ensures  that  once  all  the  incoming 
edges  of  a  vertex  v  have  been  processed,  the  maodmum  propagation  delay  along  a 
register-free  path  leading  from  a  preceding  vertex  to  vertex  v  is  maintained. 

Algorithm  MLP  terminates,  since  the  number  of  edges  is  finite  and  it  executes  a 
finite  number  of  operations  per  edge.  In  fact  the  algorithm  runs  quickly,  as  is  shown 
by  the  following  lemma. 

Lemma  2.5  Algorithm  MLR  terminates  in  0{E)  steps  on  a  circuit  G  =  (V^,  f^,d,0). 

Proof:  Steps  1  and  3  require  0(£  -t-  1')  steps.  Sorting  the  edges  of  a  directed  acycUc 
graph  in  topological  order  requires  OIL")  time  [4].  In  step  2  each  edge  is  visited 
exactly  once  and  the  number  of  operations  is  bounded  by  a  constant.  By  the  time  the 
algorithm  terminates,  therefore,  it  has  e.xecuted  0(E)  steps,  assuming  V  <  E  -  1.  □ 

In  order  to  demonstrate  the  correctness  of  Algorithm  MLP  we  proceed  in  two 
stages.  First  we  show  that  .Algorithm  MLP  yields  a  set  of  values  for  r{v)  that  sat¬ 
isfies  (2.3)  and  (2.4).  Then,  we  show  that  this  set  is  a  single-source  shortest-paths 
solution  in  the  constraint  graph  G'c,  thereby  ensuring,  according  to  Lemma  2.4,  that 
^’(I’o)  -  r(^7)  is  ma.\imized.  It  follows  directly  that  the  set  of  values  r(v)  is  a  legal 
retiming  that  minimizes  the  latency  r(v/)  —  r(vo). 

Lemma  2.6  .Algorithm  .MLP  yields  a  solution  that  satisfies  (2.3). 

Proof:  Steps  2.1  and  2.2  of  the  algorithm  change  r(v).  Both  steps  ensure  that  r(t;)  is 
only  decreasing  for  every  edge  u  —  c  in  E.  □ 

Lemma  2.7  .Algorithm  MLP  yields  a  .solution  that  satisfies  (2.4). 

Proof:  .Assume  for  the  sake  of  contradiction  that  for  some  pair  of  vertices  (uoi^fc)? 
there  exists  a  path  p  =  hq  —  «i  —  ■  •  •  Ufc-i  Ufc  in  G  with  propagation 
delay  d{p)  >  c  such  that  r{uc.)  -  r(uo)  >  -1  or,  equivalently,  r(uo)  <  r(ufc).  The 
inequality  r(uo)  b  f(njt)  and  transitive  application  of  inequality  (2.3)  imply  that 
r(u,)  =  r(Uj)  for  all  vertices  Uj  6  p.  In  this  case  step  2.3  of  the  algorithm  ensures 
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that  6{uk-i)  +  d{uk)  >  f/(/j)  which  in  turn  implies  that  S(uk-i)  +  d(uk)  >  c.  When 
visiting  vertex  Uk  the  algorithm  detects  this  condition  in  step  2.2  and  enforces  r(uk)  > 
r(uk-i),  which  contradicts  tlie  fact  that  r(u,)  =  r(Uj)  for  all  vertices  Ui,Uj  E  p.  □ 

In  order  to  show  that  the  values  r(  v)  given  by  Algorithm  MLP  are  a  single-source 
shortest-paths  solution  in  the  constraint  graph  Gc,  we  must  prove  two  basic  lemmata 
first. 

Lemma  2.8  At  any  point  of  the  algorithm,  we  have  d(v)  <  6{v)  <  c. 

Proof:  The  relation  d{i')  <  h{v)  clearly  holds  at  any  point  of  the  algorithm,  since 
initially  d(u)  =  (5(i?)  and  ^(i')  is  never  decreased. 

Now,  for  the  second  irnrt  of  the  incriuality,  observe  that  6(t;)  increases  in  step  2.3 
only.  For  the  sake  of  contradiction  assume  that  for  some  edge  u  —<■  v  the  relation 
^(n)  >  c  holds  after  the  execution  of  step  2.3.  It  follows  that  the  preconditions 
6(u)  -  6{u)-\-  d{v)  >  c  and  r{it)  =  r(i’)  of  step  2.3  must  have  been  satisfied  prior  to 
its  execution.  But,  from  the  immediately  previous  step  2.2  we  have  that  r(u)  >  r(r), 
since  6{u)  +  d{v)  >  c,  which  contradicts  the  fact  that  r(u)  =  r{v).  □ 

Lemma  2.9  For  every  vertex  v  that  has  had  all  its  incoming  edges  visited  by  the 
algorithm  we  have  A(i’)  <  c|r(i;)l  -1-  S(v),  where  A(v)  denotes  the  maximum  possible 
delay  from  vi  to  v  along  any  path  in  G . 

Proof:  The  proof  is  by  induction  on  the  vertices  that  have  had  all  their  incoming 
edges  visited  by  the  algorithm.  Initially,  vertex  i>/  has  had  all  its  incoming  edges 
trivially  visited  by  the  algorithm,  since  the  indegree  of  vj  is  zero,  and  r(v/)  =  0. 
Since  the  longest  path  from  17  to  itself  is  the  trivial  path  with  no  edges,  we  infer  that 
iN.{vi)  —  d{vi).  Also,  we  have  that  ^(17)  =  d(v/).  Therefore  A(u/)  <  c|r(v/)|  +  S{vj) 
holds. 

Now,  consider  the  inductive  step.  Since  the  edges  are  visited  in  topological  sort 
order,  whenever  all  the  incoming  edges  of  a  vertex  v  have  been  visited  all  the  incoming 
edges  of  the  vertices  u,  witli  edges  u,  —  v  have  been  visited  as  well.  Assume  for  the 
sake  of  contradiction  that  A(r)  >  c|r(j;)|  -|-  S(v)  holds  after  having  visited  all  the 
incoming  edges  u,  v  of  vortex  v.  Then,  we  have: 


(/(f) -h  max{A(u,)  :  u,  —  i;  ^  E}  >  c|r( v)] -|- 5(u), 


(2.5) 
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which  implies 

d{v)  +  m2ix{c\r{u,)\  +  d[u,)  :  u,  —  G  £■}  >  c|r{v)|  +  6(1;),  (2.6) 

since  A(u,)  <  c|r(u,)|  +  d{u,)  by  the  inductive  assumption.  Let  the  maximum  in  the 
left  hand  side  of  (2.5)  occur  for  i  =  i'.  Now,  consider  the  three  possible  orderings  of 
r(u,<)  and  r(i’): 

Case  1:  r(u,<)  <  r(  c).  This  case  is  impossible,  because  steps  2.1  and  2.2  of  the 
edgorithm  ensure  that  /  (c)  can  only  decrease. 

Case  2:  r{u,i)  =  i'{r).  In  this  case  we  have  from  (2.6); 

c\r{L')\  +  d(v}  <  (/(  r)  +  max{c|r(  u,)|  +  (S(  u,)  :  u,  —  v  e  E) 

-  -r  <  |r(  «,.)!  + ^(u,.) 

=  (l{r)  +  c\r(v)\  +  6{u,<) 

<  d( /■)  T  c|r(i’)|  +  max{<5(u,)  :  u,  —  e  €  f} 

which  implies  that 

(^(  c)  <  (/(  r)  +  max{<?(  li.)  :  u,  —  u  €  £'}• 

But  from  step  2.3  we  have  +  max{(’»(u,)  :  u,  —  u  G  £'}  =  ^(u),  which  is  a 
contradiction. 

Case  3:  r((i,-)  >  r((').  In  this  case  we  have  |r(u)|  >  |r(u,<)|  +  1,  which  implies 
c|r(c)|  >  <i;(u,')|  +  c 

>  c|r(u,-)|  +  ('(«.') 

-  iiiax(c|r(  u,)|  +  6(u,  )  :  u,  ~  v  ^  E). 

Since  b{v)  >  (l{v)  from  Lemma  2.s.  the  last  inequality  implies 

c|r(r)|  +  ^(  e)  >  </(  1;)  +  max{c|r(u,)|  +  (5(u,)  :  u,- —  u  G  £}, 

which  contradicts  ineciuality  (2.6).  □ 

Now,  using  Lemma  2.9  we  can  |)rove  tliat  the  values  r(v)  given  by  Algorithm 
MLP  are  the  lengths  of  tiie  shortest-i)aths  in  the  constraint  graph  Gc  from  the  input 
interface  v/. 


22 


(  liAl'l  EIi  J.  MISIMUM  LATENCY  PIPELINING 


Lemma  2.10  Let  1(c)  be  the  ttiujlh  of  the  shortest  path  in  Gc  from  V[  to  v.  Then 
Algorithm  MLP  ^ets  r(  v)  = 

Proof:  Assume  for  the  sake  of  tuntradiction  that  the  length  l(  v)  of  the  actual  shortest 
path  p  in  Gc  satisfies  l(v)  <  r{v)  <  0.  This  inequality  implies  that  p  traverses  at  least 
one  -1  edge  more  tiian  the  shortest  patli  indicated  by  Algorithm  MLP.  Consequently, 
there  exists  a  path  p  from  e/  to  v  in  G  with  propagation  delay  dip)  such  that  d{p)  > 
c|/(i>)|  +  1,  since  d(v)  >  1  for  every  vertex  v  e  V.  From  Lemma  2.9,  however,  the 
maximum  possible  delai'  N(c)  Iroin  /•/  to  c  along  any  path  in  G  satisfies 

Air)  <  <:jr(r)|  +  <!>(r) 

<  ^ir(r)|  +  c 

<  r(|/(r)l  -  1)  +  c 

<  r|/(r)|+l. 

implying  A(r)  <  d(p).  which  i-i  a  contradiction.  □ 

Combining  lemmata  2.1.  2.().  2.7  and  2.10,  we  obtain  the  following  theorem. 

Theorem  2.11  Algorithm  M [.  P  correi  tly  solves  Problem  MLP.  □ 

This  theorem  completeN  the  correctness  proof  of  .Algorithm  MLP. 

In  summary,  in  tliis  ch.ipter  we  presented  and  proved  the  correctness  of  a  greedy 
strategy  for  pipelining  conil)inati<)n.d  circuitry.  The  clock-period  of  the  pipelined  cir¬ 
cuit  is  guaranteed  not  to  exceed  a  >j)ecilied  upper  bound  c  and  its  latency  is  guaranteed 
to  be  minimal  under  the  given  i  lock-piuiod  constraint.  The  running  time  of  the  al¬ 
gorithm  is  directly  proportional  to  the  number  of  interconnections  in  the  circuit  and 
is  optimal  within  a  constant  factor.  I'he  given  procedure  is  used  extensively  as  a 
subroutine  of  the  algorithms  in  the  following  section. 


Chapter  3 


Minimum  Clock- Period 
Characterization 


In  this  chapter  we  give  a  concise  characterization  of  the  minimum  feasible  clock-period 
of  a  circuit  in  terms  of  the  maximum  diJatj-to-register  ratio  of  the  directed  cycles  in  the 
circuit  graph.  This  characterization  loads  to  improved  algorithms  for  various  retiming 
problems. 

The  chapter  is  structured  as  follows.  Section  3.1  introduces  basic  definitions  that 
are  used  throughout  the  chapter.  Section  3.2  gives  an  exact  characterization  of  the 
minimum  feasible  clock-period  for  unit-delay  circuits  and  Section  3.3  gives  a  range  of 
D  values  for  the  minimum  feasible  clock-period  of  general  circuits,  where  D  is  the 
maximum  propagation  delay  of  the  circuit  components.  The  previous  ranges  known 
for  both  cases  had  0(1  ')  values. 

The  algorithmic  implications  of  the  minimum  feasible  clock-period  characteriza¬ 
tions  are  given  in  the  tliree  subsections  of  Section  3.4.  Section  3.4.1  gives  an  0(E) 
algorithm  for  minimum  clock-period  pipeiining  of  unit-delay  combinational  circuitry. 
The  running  time  of  this  algorithm  is  optimal  within  a  constant.  An  0{E\gD)  algo¬ 
rithm  for  minimum  clock-period  pipelining  of  general  combinational  circuitry  is  also 
presented  in  this  section.  Section  3.4.2  presents  an  0(min{  V''/^£lg( KZ?),  Vfl})  algo¬ 
rithm  for  minimum  clock-period  retiming  of  unit-delay  circuitry,  and  an  0{V Elg  D) 
algorithm  for  minimum  clock-period  retiming  of  general  circuitry.  Finally,  Section  3.4.3 
gives  an  0(min{T'/‘^/r  lg(  ril' )  lg(  1  /->).  1  ft  lg(  kT*)})  algorithm  for  determining  a  re¬ 
timing  of  a  general  circuit  sii<  h  that  the  clock-period  is  approximately  minimized. 
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3.1  Preliminaries 


In  this  section  we  give  some  basic  definitions  that  we  will  use  throughout  the  rest  of 
the  chapter. 

Let  G  =  (I'',  E,  d,  w)  be  a  circuit  graph.  We  denote  by  D  the  majcimum  propagation 
delay  of  the  circuit  components: 


D  =  ma.K{d(t;)  ;  v  6  V}. 


We  define  the  delny-to-ngistf  i-  ratio  R[C)  of  a  cycle  C  =  uq  ^  Wi  ^  . . .  Vk-i 
'’0  in  the  :ircuit  G  as  follows; 


R{C)  = 


v6C 


Y 


We  denote  by  C'(G)  the  directed  cycle  in  G  with  maximum  delay- to- register  ratio. 
By  definition  R{C'{G))  >  R{C)  for  every  cycle  C  €  G. 

A  clock-period  c  is  called  feasible  for  the  circuit  G  if  and  only  if  there  exists  a 
retiming  r  of  G  such  that  4>(6'r)  <  c.  Finally,  we  denote  by  ^min{G)  the  clock-period 
of  the  retimed  circuit  GV  with  the  smallest  possible  clock-period: 


^’mm(G')  =  miu{4>(6V)  ;  r  is  a  retiming  of  G}. 


3.2  Minimum  Period  for  Unit-Delay  Circuits 

In  this  section  we  relate  the  minimum  clock-period  ^*min(G),  that  we  can  obtain  by 
retiming  a  given  unit-delay  circuit  G'  =  ( I 1,  in),  with  the  maximum  delay-to- 
register  ratio  /2{C'*(6'))  of  the  cycles  C  in  the  circuit  graph  G.  Specifically,  we  show 
that  <&mm(G’)=  r/Z(C'-(G’))l. 

The  result  presented  in  this  section  relies  on  a  retiming  theorem  in  [17],  which 
gives  a  characterization  of  when  a  unit-delay  circuit  has  a  clock-period  less  than  or 
equal  to  c.  The  theorem  is  phrased  in  terms  of  the  graph  G  —  1/c,  which  is  defined  as 
G  —  1/c  =  (V^,  E,d,  w')  where  ir'Ie)  =  ir(e)  -  i/c  for  every  edge  e  £  E.  Thus,  G  —  1/c 
is  the  graph  obtained  from  (1  by  subtracting  1/c  from  the  weight  of  each  edge  in  G. 
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Theorem  3.1  Let  G  =  {V,EA,w)  be  a  unit-delay  synchronous  circuit,  and  let  c  be 
any  positive  integer.  Then  there  is  a  retiming  r  of  G  such  that  ^(Gr)  <  c  if  and  only 
if  G  —  \jc  contains  no  cycles  having  negative  edge-weight.  □ 

We  can  use  Theorem  3.1  to  characterize  the  minimum  clock  period  $min(G)  in 
terms  of  the  maximum  delay-to-rogister  ratio  R{C’{G)). 

Theorem  3.2  Let  G  -  w)  be  a  unit-delay  synchronous  circuit  with  maximum 

delay-to-register  ratio  R(C'‘[G)).  Let  ‘^min(G)  denote  the  minimum  clock-period  that 
can  be  obtained  by  retiming  (i.  Then 


<I>,n,AG]=  \R{C'(G))]. 


Proof:  According  to  Theorem  3.1,  a  clock-period  c  is  feasible  if  and  only  if  G  —  1/c  has 
no  negative-weight  cycles.  Tims,  for  every  cycle  G  =  uq  ^  uj  ^  . . .  Vk-\  vq 
in  G  and  any  feasible  clock-period  c  we  have; 


Equivalently: 


t-i 

J2(w{e,)  -  1/c)  >  0. 
1=0 


k-l 

>  k/  u;(e,). 
1=0 


The  right  hand  side  of  the  last  inerjuality  equals  R(C),  by  definition,  and  since  this  in¬ 
equality  holds  for  every  cycle  C  G  G  it  must  also  hold  for  C*(G).  Thus  c  >  R(C‘(G)). 
Now,  the  integrality  of  c  implies  that  c  >  fR(C‘(G))],  and  since  c  >  $mir»(G)  > 
[y2(C*(G))]  for  every  feasilde  [)eriod  c.  we  have  that  ^TntTi(G)  =  [i?(C*(G))].  □ 


3.3  Minimum  Period  for  General  Circuits 

In  this  section  we  relate  the  minimuin  clock-period  ^min(G),  that  we  can  obtain  by 
retiming  a  given  general  circuit  G  =  (W  E,d,w),  with  the  delay-to-register  ratios  of 
the  cycles  in  the  circuit  graph  G  and  the  propagation  delays  of  the  circuit  components. 
Specifically,  we  show  that 


\R[C-(G))]  <  <!>,„, „(G)  <  ffl(C-(G))l  +  D, 
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where  D  denotes  the  maxiiuuni  propagation  delay  of  the  elements  in  the  circuit,  and 
C*(G)  denotes  the  cycle  in  6’  with  maximum  delay-to-register  ratio  R{C"{G)).  Ob¬ 
serve  that  both  the  lower  and  the  upper  bound  are  independent  of  the  size  of  the 
circuit. 

There  is  no  counterpart  of  Theorem  3.1  known  for  general  circuits.  This  is  the 
reason  why  we  cannot  obtain  an  exact  characterization  of  $mm(G)  for  general  circuits 
in  a  manner  similar  to  that  of  the  previous  section  for  unit-delay  circuits.  However, 
we  are  still  able  to  give  tight  bounds  for  $mm(G),  which  are  independent  of  the  size 
of  the  circuit. 

The  next  theorem  gives  a  necessary  condition  for  a  circuit  to  have  a  clock-period  less 
than  or  equal  to  c,  and  will  he  used  to  derive  a  lower  bound  for  $mtn(G).  The  theorem 
is  phrased  in  terms  of  the  graph  G  -  d/c,  which  is  defined  a.s  G  -  dfc  =  {V,  E,d,w'), 
where  uj'(e)  =  w[e)  -  d{v)/c  for  every  edge  v.  ^  v  E  E. 

Theorem  3.3  Let  G  =  ( V,  E,d.  tv)  be  a  synchronous  circuit,  and  let  c  be  any  positive 
integer.  If  there  is  a  retiming  r  of  G  such  that  ^(Gr)  <  c  then  G  —  dfc  contains  no 
cycles  having  negative  edge-weight. 

Proof:  Assume  there  exists  a  retiming  r  of  G  such  that  *^(Gr)  <  c.  Consider  any  cycle 
C  €  Gr-  For  every  register-free  |)ath  p  =  I’o  ^  f i  -^  . . .  Vk-i  Vk  in  the  cycle 
we  have  '^i=od{vi)  <  c.  Let  «v(G)  =  Z!e.ec number  of  registers  in  G. 
Then,  by  adding  the  contributions  from  the  Wr(C)  register-free  paths  in  G,  we  get 

r,ec  r.ec 

or,  equivalently, 

X  “  X  -  0- 

Now,  recall  that  iL\{e)  =  w{  r )  -f  r(  n)  -  r(  v)  for  every  edge  u  v  ^  C .  Consequently, 
the  sum  over  the  edges  in  C  telescopes,  yielding 


X  “  X  -  0- 

t’lGC" 


Since  this  statement  is  true  for  every  cycle  C  £  G,  we  conclude  that  G  —  d/c  contains 
no  cycles  with  negative  edge-weight.  □ 
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As  a  direct  consequence  of  Tlicoreni  3.3,  we  have  the  following  lower  bound  on  the 
minimum  feasible  clock-period  of  a  general  circuit: 

Corollary  3.4  Let  C  =  (V',  £\d.  w)  be  a  synchronous  circuit  with  maximum  delay-to- 
register  ratio  i2(C*(G)),  and  let  be  the  minimum  clock-period  we  can  obtain 

by  retiming  G.  Then 

Proof:  For  any  feasible  clock-period  c,  Theorem  3.3  implies 

{w(e)  -  d{v}/c)  >  0 

U—  ItC* 

for  every  cycle  C  ^  G.  Eciuivalentiy.  c  >  R(C)  for  every  cycle  C  £  G,  which  yields 
c  >  R{C’{G))  for  C  =  C‘(6').  Since  this  lower  bound  holds  for  every  feasible  clock- 
period,  we  have  iZ(C*(G))  <  ‘hmiH(G).  Given  that  the  propagation  delays  of  the 
circuit  components  are  integers  we  infer  that  ^min(G)  must  be  an  integer  as  well. 
Therefore  fi2(C*(6'))]  <  <l?mtn(G').  □ 

Observe  that  the  converse  of  Theorem  3.3  is  not  true.  Specifically,  given  a  circuit 
G  =  (V,£,d,  w),  if  G  -  d/c  has  no  negative  weight  cycles  it  does  not  follow  that  there 
exists  a  retiming  r  of  the  circuit  such  that  4>(Gr)  <  c.  The  validity  of  this  statement 
can  be  demonstrated  most  easily  with  the  help  of  an  example.  Consider  the  circuit  of 
Figure  3.1,  which  is  configured  as  a  ring  with  three  registers  and  four  computational 
elements.  It  is  impossible  to  get  a  retiming  with  clock-period  c  =  3,  even  though 
R{C“{G))  =  3,  since  there  is  only  one  register  available  to  be  placed  among  the  three 
elements  of  delay  2. 

Even  though  the  converse  of  Theorem  3.3  is  not  true,  we  can  still  find  an  upper 
bound  for  ^min{G)  in  term.s  of  the  ma.ximum  delay- to- register  ratio  R{C’‘{G))  in  the 
circuit  and  the  ma.ximum  propagation  delay  D  of  the  circuit  components. 

Lemma  3.5  Let  G  =  (\  \  E ,d,  ir]  be  a  synchronous  circuit  with  maximum  delay-to- 
register  ratio  R(C"(G)),  and  let  ‘tmm(G)  be  the  minimum  clock-period  we  can  obtain 
by  retiming  G.  Then 

'^„un(G)  <  r/?(C-(G))l  -h  D. 
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Figure  3.1:  A  synchronous  circuit  G  with  three  registers  and  four  computational  ele¬ 
ments.  The  propagation  delay  of  each  element  is  indicated  in  the  vertex  which  repre¬ 
sents  it.  The  circuit  cannot  be  retimed  to  have  period  c  =  3,  even  though  G-d/c  has 
no  negative  weight  cycles. 


Proof:  We  will  prove  that  ‘h <  \R{C’{G))\  A  D  by  showing  that  G  can  be 
retimed  to  have  clock- period  c  =  \ R(C‘(G))]  +  D. 

According  to  the  mathematical  programming  formulation  of  retiming,  which  was 
given  in  Theorem  2.1.  the  circuit  G  can  be  retimed  to  achieve  period  c  if  we  can  find 
a  set  of  values  r(v)  such  that 

r(L')  -  r(u)  <  w(u v)  (3.1) 


for  every  edge  u  —  n  €  E,  and 

/•(  r)  -  /•(  u)  <  lF(u,  n)  —  1 


(3.2) 


for  all  vertices  u,  v  such  that  D(  u.v)  >  c.  Let 


Ew  =  {u  —  e  :  «.  e  6  W  r(v)  -  r(u)  is  constrained  by  (3.2)}. 


The  constraint  sets  (3.1)  and  (3.2)  induce  the  constraint  graph  Gc  =  {V,  Eli  Ew,Wc), 
where 


I  (/-(  n  —  v)  u  V  £  E, 

.According  to  Lemma  2.2  and  Lemma  2.3,  the  circuit  G  can  be  retimed  to  achieve 
clock-period  $(6V)  <  c  e.xactlv  when  GV-  has  no  negative  weight  cycles.  Let  us  assume 
for  the  sake  of  contradiclion  that  G~  does  have  a  negative  weight  cycle  C~  €  Gc, 
which  consists  of  two  sets  of  edges  E[  and  £'2,  with  E[  Q  E,  E2C  Ew, 

IFJ2I  =  ”2-  Since  the  edge-weights  are  integral  we  have 


(3.3) 
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Let 

£'2  =  {ui  ^  V2  :  I’l  V2  £  E ,  vi  —<■  V2  €  u  V,  u  —>  V  e  E2}, 

where  v  denotes  the  critical  path  in  u  from  u  to  v.  Then,  according  to  inequal¬ 
ity  (3.3),  we  have: 

^  w{e)  -i-  ^  w(e)  -  m  -  «2  +  «2 

e€Ei  e€£i'  egEj  eCE^' 

=  ^  w(e)  -I-  Wcie)  -I-  712 

e€£J  eeEj 

=  ^  it;c(e)  +  Y  +  7i2 

egEJ  egEj 

<  712-1. 


Now,  for  the  delay-to- register  ratio  of  the  cycle  which  consists  of  the  edges  E[  U  £2 
G  we  have  : 


^u^.6E;  +^u-...6E''^’(^) 

> 

> 


> 


^ui.v€E;^(^)  +^u^vgE»^(^) 
772-1 

772-1 
772(c  -  D) 

772  —  1 


772  -1 
»2 

112  —  1 


£(C*{G)). 


Since  n2l{n2  -  1)  >  1.  we  conclude  that  there  exists  a  cycle  in  G  with  delay-to- 
register  ratio  greater  than  the  nuiximum  delay-to-register  ratio  R(C“{G)),  which  is  a 
contradiction.  Therefore,  Gc  has  no  negative  weight  cycles  and  c  =  |’iZ(C*(G))]  -t-  D 
is  a  feasible  clock-period.  Consequently  ^rnin(G)  <  f R{C‘{G))]  E  D.  □ 

Corollary  3.4  and  Lemma  3. -5  imply  the  following. 


Theorem  3.6  Let  G  =  (V',  E.d,  w)  be  a  synchronous  circuit  with  maximum  delay-to- 
register  ratio  /?(C‘(6')),  and  let  „(G)  be  the  minimum  clock-period  we  can  obtain 
by  retiming  G.  Then 

\R{C'[G))\  <  ‘I>,nm(G)  <  ff2(G*(G))l  -h  D. 
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Circuit  Type 

Transformation 

Running  Time 

UD  Combinational 

Min  Clock-Period 

0(E) 

Combinational 

Min  Clock- Period 

0(E\gD) 

UD  Sequential 

Min  Clock-Period 

Sequential 

.Min  Clock  Period 

0(VE\gD) 

Sequential 

.4pprox  Min  Clock-Period 

oLinl  Ve£|g(FH.)lg(FC)  \\ 

v'£lg(VB)  ]) 

Figure  3.2;  Summary  of  problems  and  running  times  of  corresponding  algorithms.  The 
initials  UD  denote  unit-delay  circuitry. 


3.4  Algorithmic  Implications 

In  this  section  we  study  the  algorithmic  implications  of  the  minimum  clock-period 
characterization  for  a  variety  of  retiming  problems.  VVe  use  the  ideas  of  the  previous 
sections  to  develop  fast  algorithms  for  minimum  clock-period  pipelining  of  combina¬ 
tional  circuitry.  VVe  show  how  to  obtain  improved  running  times  for  clock-period 
minimization  of  sequential  circuits,  using  known  graph-theoretic  algorithms.  Finally, 
we  give  a  faster  algorithm  for  approximate  clock-period  minimization  of  general  se¬ 
quential  circuits.  The  problems  listed  in  this  section  along  with  the  running  times  of 
the  corresponding  algorithms  are  illustrated  in  Figure  3.2. 

3.4.1  Minimum  clock-period  pipelining 

VVe  use  the  ideas  of  the  previous  sections  to  develop  fast  algorithms  for  minimum 
clock-period  pipelining  of  combinational  circuitry.  SpecificaDy,  we  give  an  0(E)  op¬ 
timal  algorithm  for  minimum  clock-period  pipelining  of  unit-delay  combinational  cir¬ 
cuitry  and  an  0(E[gD)  algorithm  for  minimum  clock-period  pipelining  of  general 
combinational  circuitry. 

Let  us  consider  unit-delay  circuitry  first.  The  problem  of  minimum  clock-period 
pipelining  is  defined  as  follows:  Given  a  unit-delay  combinational  circuit  G  =  (V,  E,  1,0) 
and  a  positive  integer  I,  determine  a  retiming  r  such  that  Gr  is  a  pipelined  combina¬ 
tional  circuit  with  latency  no  greater  than  I  and  with  minimum  clock-period.  The  fol¬ 
lowing  lemma  characterizes  the  minimum  feasible  clock-period  in  terms  of  the  longest 
propagation  delay  A  of  a  path  in  the  circuit  and  the  latency  /. 
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Lemma  3.7  Let  G  —  (V  ,  £',1.0)  be  a  unit-delay  combinational  circuit  with  input 
interface  vj  and  output  interface  oq-  Let  A  be  the  number  of  vertices  in  the  longest 
path  p^  =  VI  vq  in  G,  and  let  I  be  a  positive  integer.  Then  the  minimum  clock- 
period  ^rnin(G)  for  any  pipelined  version  of  G  with  latency  I  is 

^m.n{G)=  . 

Proof:  Any  retiming  r  of  the  circuit  that  gives  a  pipelined  version  of  the  circuit  with 
latency  I  satisfies  constraints  (2.1 )  and  (2.2)  as  well  as  a  latency  constraint.  Specifically, 
it  satisfies 

r{v)  -  r(u)  <  0 

for  every  edge  u  u  in  £, 

/•( v)  -  r(u)  <  -1 

for  all  vertices  u,  v  €  V'  such  that  D{  a,  v)  >  c,  and 

r(  I'l)  -  r(  vn)  <  1. 

This  set  of  inequalities  induces  the  constraint  graph  Gc  =  {Vc,Ec,Wc)  and  accord¬ 
ing  to  Lemma  2.2  and  Lemma  2.3  it  is  fea.sible  if  and  only  if  there  exists  no  negative- 
weight  cycle  in  Gc-  We  shall  use  this  statement  to  show  that  ^min{G)  =  [A/(/  -i-  1)]  . 

First  we  show  that  [A/(/  -f-  1)]  is  a  lower  bound  for  ^min{G).  Let  r  be  a  feasible 
retiming  of  the  circuit  with  latency  /  and  clock-period  c.  Every  path  in  Gr  from  u/ 
to  Vo  has  /  -h  1  register-free  parts.  Consider  the  longest  such  path  with  delay  A. 
Adding  up  all  the  contributions  yields  A  <  c{/  -)-  1),  which  implies  c  >  A/(/  -H  1). 
Therefore,  $min(G)  >  A/(/  -I-  1),  or  <&min(G’)  >  [A/(/  -h  1)],  since  must  be 

an  integer. 

Now,  we  prove  that  fA/(/  -i-  1)].  the  lower  bound  of  $„„„(G),  is  a  feasible  clock- 
period,  thus  establishing  the  desired  equality.  In  order  to  prove  feasibility  of  the  lower 
bound  it  suffices  to  show  that  G'c  has  no  negative-weight  cycles  for  c  =  [A/(/-b  1)]. 
Equivalently,  since  the  maximum  number  of  —1  edges  in  any  path  is 

A  -  1 

.rA/(/-M)iJ’ 

it  suffices  to  show  that 

A  -  1 
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We  have 


A  -  I 

A  -  1 

LrA/(/+i)iJ 

La/(/+i)J 

=  [(/+1)(A-1)/AJ 
=  [(/+1)(1-1/A)J 
=  [/  +  l-(/+l)/AJ 

since  (/  +  1)/A  <  1.  Therefore,  G'c  has  no  negative- weight  cycles,  which  implies  that 
[A/(i  -f-  1)]  is  a  feasible  clock- period  in  addition  to  being  a  lower  bound  for  ^min(G)- 
Therefore  $min(G)  =  [A/(  /  +  1)].  □ 

Now,  we  give  the  following  algorithm  for  the  problem  of  minimum  clock-period 
pipelining.  The  correctness  of  the  algorithm  follows  from  Theorem  2.11  and  Lemma3.7. 

Algorithm  UDMPP  ( Unit- Delay  Minimum  Period  Pipelining)  Given  a  unit-delay 
combinational  circuit  G  =  (!'.  E,  1,0)  with  input  interface  u/  and  output  interface  vq, 
and  a  positive  integer  /,  determine  a  retiming  r  such  that  Gr  is  a  pipelined  combina¬ 
tional  circuit  with  latency  /  and  minimum  clock-period. 

1.  Determine  the  number  of  vertices  A  in  the  longest  path  in  G  from  u/  to  vq- 

2.  Run  Algorithm  MLP  on  G  with  clock-period  [A/(/  -1-  1)].  □ 

The  algorithm  terminates  in  0{E)  steps,  since  step  1  is  a  depth-first-search  in  the 
graph  and  Algorithm  MLP  runs  in  0(E)  steps. 

Now,  we  consider  the  case  of  general  combinational  circuitry.  The  problem  of 
minimum  clock-period  pipelining  is  defined  in  an  analogous  way:  Given  a  unit-delay 
combinational  circuit  G  =  (V.  E .<IA))  and  a  positive  integer  I,  determine  a  retiming 
r  such  that  Gr  is  a  pipelined  combinational  circuit  with  latency  no  greater  than  I  and 
minimum  clock-period.  The  following  lemma  characterizes  the  minimum  fea.sible  clock- 
period  in  terms  of  the  delay  A  of  the  longest  path  in  the  circuit,  the  latency  /,  and 
the  longest  component  delay  D. 


Lemma  3.8  Let  G  =  (V,  E .d,0)  be  a  combinational  circuit  with  input  interface  vj 
and  output  interface  vq-  Let  A  be  the  delay  of  the  path  p^  =  vi  vq  in  G  with 
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the  longest  propagation  delay,  and  let  I  be  a  positive  integer.  Then  the  minimum 
clock-period  ^min{G)  for  any  pipelined  vei'sion  of  G  with  latency  I  satisfies: 

T+i  TTI 

where  D  is  the  longest  component  delay  in  the  circuit. 

Proof:  Any  retiming  r  of  the  circuit  that  gives  a  pipelined  version  of  the  circuit  with 
latency  I  satisfies  constraints  (2.3)  and  (2.4)  as  well  as  a  latency  constraint.  Specifically, 
it  satisfies 

r(  f)  -  r(  u)  <  0 

for  every  edge  u  u  in  E. 

r{v)  -  r(u)  <  -1 

for  all  vertices  u,  n  €  V  such  that  D(u,  v)  >  c, 

r{vi)  -  r{vo)  <  1. 

First,  we  derive  the  lower  bound  of  the  inequality.  Consider  the  constraint  graph 
Gc  induced  by  the  above  constraints.  Let  r  be  a  feasible  retiming  of  the  circuit  with 
latency  I  and  clock-period  c.  F.very  path  in  Gr  from  u/  to  vq  has  I  4-  1  register-free 
parts.  Adding  up  all  the  delays  of  the  register-free  parts  along  the  longest  such  path 
p^  yields  A  <  c{l  +  1),  which  implies  c  >  A/{1  1).  Therefore,  ^min{G)  >  A/{1  +  1), 

or  ^min{G)  >  fA/(/  -|-  1)],  since  '■Pmin(G)  must  be  an  integer  as  a  consequence  of  the 
fact  that  d{v)  £  Z  for  every  verte.x  v  6  V. 

Now,  we  establish  the  upper  Iround  of  the  inequality  by  proving  that  [A/(/  -t-  1)]  -|- 
D  is  a  feasible  clock-period.  In  order  to  achieve  this  it  suffices  to  show  that  Gc  has  no 
negative-weight  cycles  for  r  =  [A/(/  -I-  1)].  The  maximum  number  of  —1  edges  in  any 
path  is 

A  -  1  A  -  1 

.{\A/(u  1)1  +  d)-d\  ~  LrA/(/+  1)1 

But  we  have  already  shown  in  Lemma  3.7  that 

A  -  1 
rA/(/+l)l 

Hence  Gc  has  no  negative-weight  cycles.  □ 
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We  give  an  0(E\gD)  algorithm  for  minimum  period  pipelining  of  combinational 
circuitry.  Its  correctness  follows  from  Theorem  2.11  and  Lemma  3.8. 

Algorithm  MPP  (Minimum  Period  Pipelining)  Given  a  combinational  circuit  G  ~ 
{V,E,d,0)  with  input  interface  v/  and  output  interface  vq,  and  a  positive  integer  I, 
determine  a  retiming  r  such  that  Gr  is  a  pipelined  combinational  circuit  with  latency 
/  and  minimum  clock- period. 

1.  Determine  the  delay  A  of  the  longest  path  in  G  from  V[  to  vq. 

2.  Binary  search  among  the  D  possible  values  of  $min(<j)  applying  Algorithm  MLP 

on  G.  □ 

Step  1  is  a  depth-first  search  in  G  and  Step  2  performs  O(lgZ))  applications  of  Algo¬ 
rithm  MLP.  Therefore,  .Algorithm  .\IPP  terminates  in  0{E\gD)  steps. 

3.4.2  Minimum  clock-period  retiming 

In  this  section  we  study  the  implications  of  the  minimum  period  characterization  for 
retiming  of  sequential  circuitry.  Specifically,  we  consider  the  problem:  Given  a  sequen¬ 
tial  circuit  G  =  (V^  E,d,w),  determine  a  retiming  r  such  that  $(Gr)  is  minimum. 

We  consider  unit-delay  circuitry  first.  In  order  to  compute  the  minimum  feasible 
period  of  the  circuit  we  can  use  Karp’s  0{V E)  algorithm  for  finding  minimum  mean 
cycles  in  a  graph  [11].  Then,  using  Bellman-Ford’s  shortest-paths  algorithm  on  G—  1/c 
we  can  find  a  retiming  r  such  that  4»(Gr)  is  minimum,  according  to  Theorem  3.1.  The 
overall  running  time  is  0(VE},  which  is  an  improvement  over  the  best  previously 
known  strongly  polynomial  algorithm  by  a  IgF  factor,  since  it  eliminates  the  need  for 
binary  search.  Using  scaling  we  obtain  an  E\g{VW))  algorithm  for  the  same 

problem,  where  W  is  the  ma.vimum  register  count  among  the  edges.  This  algorithm 
utilizes  Orlin-.A.huja’s  E\g{VW))  algorithm  for  minimum  mean  cycles  [20],  fol¬ 

lowed  by  Gabow-Tarjan's  0{V^f^ E\g[V\V))  scaling  algorithm  for  shortest-paths  [8]. 

For  general  circuits  we  obtain  an  0{V E\gD)  running  time  by  binary  searching 
with  the  general  retiming  algorithm  described  in  [17]  the  range  of  the  D  possible 
values  for  the  clock  -period  of  the  circuit.  .A.n  interesting  open  question  is  whether  we 
can  obtain  a  better  running  time  by  using  scaling. 
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3.4.3  Approximately  minimum  clock-period  retiming 

In  this  section  we  give  an  algorithm  for  determining  a  retiming  of  a  general  circuit 
such  that  the  clock-period  is  approximately  minimized.  Specifically,  we  consider  the 
following  problem:  Given  a  sequential  circuit  G  =  {V,  E,d,w)  determine  a  retiming  r 
such  that  ^(Gr)  <  ^mtn(G)  +  D  <  '2^Tnin{G),  where  D  is  the  maximum  propagation 
delay  of  the  circuit  components.  We  show  that  using  scaling  this  problem  can  be  solved 
faster  than  minimum  clock-period  retiming  by  a  factor  of  / {\g{yW)\g{V D)). 

The  algorithm  for  approximately  minimum  clock-period  retiming  is  based  on  the 
lemma  that  follows.  We  denote  by  G  —  dfc  the  graph  with  vertex  set  V,  edge  set  E 
and  edge  weight  w(e)  -  d(v)/c  on  each  edge  u  v  £  E. 


Lemma  3.9  Let  G  -  {V,  E.d,w)  be  a  circuit  graph  with  maximum  delay-to-register 
ratio  R{C“{G))  and  let  ^min{G)  be  the  minimum  clock-period  we  can  obtain  by  retim¬ 
ing  G.  Moreover,  let  n  -  [/?(C"(G'))],  and  let  l{v)  be  the  solutions  of  a  single-source 
shortest-paths  problem  on  G -d/n.  Then,  the  assignment  r{v)  =  [/(v)]  for  each  vertex 
V  £.V  is  a  retiming  of  G  such  that 


^(G'r)  <  $mm(G)  -f  D. 


(3.4) 


Proof:  Note  that  the  shortest-paths  lengths  l{v)  are  weU-defined,  since  G-d/  f/?(C*(G))] 
has  no  negative-weight  cycles.  In  order  to  prove  that  7-(i;)  =  [/(v)]  is  a  legal  retiming 
with  clock-period  4*(Gr)  <  4*mirt{G)  +  D,  we  show  that  it  satisfies  constraints  (2.1) 
and  (2.2)  with  c  =  f/ZlC'lG))]  -b  D.  Then,  we  conclude  inequality  (3.4)  directly  from 
Corollary  3.4. 

First,  we  prove  that  r(i')  =  [/(i')1  for  each  v  £  V  satisfies  constraints  (2.1).  For 
every  edge  u  v  we  have  : 


[■/(e)]  -  [/(«)]  <  \l{v)-l{u)] 

<  \w{e)  -  d{v)fn'\ 

<  fu;(e)l 

=  'i-’(e), 


since  [x  -  y]  <  [x]  -  fy]  for  every  real  x,y,  and  w{e)  is  an  integer.  Therefore,  [/(u)] 
satisfies  (2.1). 
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i\ow,  we  prove  that  the  assignment  /( 
straints  (2.2).  Consider  any  path  p  =  uq 
Yl'!=od{u,)  >  c.  For  this  path  we  have: 


‘  (§■<“) 


•j)  =  |i(n)l  lor  each  v  £  V  satishes  con- 

eo  ^k—2  — 1  .1  1  1 

ui  .  —  Ufc_i  ^  Uk  With  delay 


(c  +  1)  -  d(uo) 


n 

[j?(C-(G))1  +  D+  l-d{uo) 
\R(^'{G))] 

1. 


since  D  +  i  -  d{uo)  >  1.  Tliereforo. 


-  i/(“o)i  <  \i(uk)  -  liuo)] 

■  (S""')"' 

which  implies  that  satisfies  constraints  (2.2). 

Therefore,  the  assignment  of  lead  [/(t;)]  to  each  verte.x  v  £  V  yields  a  legaJ  retim¬ 
ing  with  clock-period  4>(6'r)  <  [/?(C*(G'))1  -f-  D.  From  Corollary  3.4  it  follows  that 

^{Gr)  <  ^mirj(C)  +  D.  □ 

The  algorithm  for  appro.ximately  minimum  clock-period  is  based  on  Lemma  3.9 
and  it  proceeds  as  follows: 

Algorithm  ApproxCPM  (Approximate  Clock- Period  Minimization)  Given  a  circuit 
G  —  (V,  E,d,  w)  with  ma.ximum  delay-to-register  ratio  R{C“{G)),  minimum  feasible 
clock-period  4*mtn(G')  and  maximum  component  delay  D,  determine  a  retiming  r  of 
the  circuit,  such  that  $(G’r)  <  [/?(G*(G))1  +  D  <  ^*min(G)  -t-  D. 

1.  Compute  n  =  (/?(C’*(G’))]  by  binary  searching  in  the  range  [1,...,VD]. 
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2.  Let  l{v)  be  the  lengtlis  of  the  sliortest-paths  iii  G  —  d/n  from  some  source  vertex 
s  €  V'. 

3.  Set  r(v)  —  for  every  vertex  v  ^  V.  □ 

Step  1  of  the  algorithm  binary  searches  for  the  smallest  integer  n,  which  exceeds  the 
maximum  delay- to- register  ratio  R{C’(G)).  This  ratio  is  positive  and  cannot  exceed 
V D,  since  the  maximum  propagation  delay  of  the  circuit  components  is  D  and  since  the 
longest  simple  path  in  tlie  circuit  has  at  most  V  vertices.  Each  one  of  the  0(lg(V'Z))) 
iterations  of  tlie  binary  search  checks  for  negative-weight  cycles  in  G  —  d/n.  The  value 
of  [7?(C*(G’))]  equals  the  smallest  integer  n  in  the  range  that  induces  no  negative- 
weight  cycles  in  G  —  d/n  [12],  .Xegative-weight  cycles  can  be  detected  in  0(VE)  steps, 
using  Bellman- Ford's  algorithm  [12],  or  in  E\g[V\V ))  steps,  where  \V  is  the 

maximum  register  count  along  any  connection  in  the  circuit,  using  Gabow-Tarjan's 
shortest-paths  algorithm  [8|.  .Step  2  requires  a  single-source  shortest-paths  algorithm 
and  Step  3  terminates  in  0[\')  steps.  Step  1  of  the  algorithm  dominates  the  total 
running  time  yielding  an  0(  inia{  lg(  V'lT)  lg(  V'D),  TFilgl  V'D)} )  running  time 

overall. 

In  summary,  in  this  chapter  we  presented  a  novel  and  concise  characterization  of 
the  minimum  clock-period,  that  can  be  obtained  by  retiming  a  synchronous  circuit 
G,  in  terms  of  the  maximum  ilelay-to-register  ratio  of  the  cycles  in  the  circuit  graph 
and  the  maximum  propagation  delay  of  the  circuit  components.  Based  on  the  ideas 
behind  this  characterization,  we  gave  an  optimal  algorithm  for  optimal  pipelining  of 
unit-delay  combinational  circuitry  and  an  efficient  algorithm  for  optimal  pipelining  of 
general  combinational  circuitry.  We  also  gave  improved  algorithms  for  minimum  clock- 
period  retiming  of  unit-delay  and  general  circuitry.  Finally,  we  described  a  technique 
which  yields  a  retiming  with  clo(;k-i)eriod  that  does  not  exceed  the  minimum  by  more 
than  a  factor  of  2  and  is  asymptotically  faster  than  the  known  algorithms  for  minimum 
clock-period  retiming. 


Chapter  4 


The  Closed  Semiring  Structure 
of  Retiming 


This  chapter  investigates  group- theoretic  properties  of  retiming  on  unit-delay  circuitry. 
Specifically,  we  show  tliat  retiming  of  unit-delay  circuitry  can  be  described  in  terms 
of  a  closed  semiring.  The  three  sections  of  this  chapter  are  organized  as  follows.  In 
Section  4,1  we  review  the  notion  of  a  closed  semiring.  In  Section  4.2  we  construct  the 
closed  semiring,  that  captures  the  structure  of  unit-delay  circuitry  retiming.  Finally,  in 
Section  4.3  we  utilize  the  additive  and  multiplicative  operations  of  the  closed  semiring 
in  order  to  design  an  0{V E]  algorithm  for  unit-delay  circuitry  retiming. 

4.1  Preliminaries 

In  this  section  we  review  the  notion  of  a  closed  semiring.  A  more  detailed  exposition 
can  be  found  in  [4j. 

Let  5  be  a  set  of  elements,  and  let  r  ;»iid  3  be  binary  operations  on  S .  A  system 
(5,©, 0,0,1)  is  a  closed  semiring  if  it  satisfies  the  following  properties; 

1.  (S,  ©,0)  is  a  monoid: 

•  5  is  closed  under  ©:  a  <+>  b  ^  S  for  all  a,b  £  S. 

•  ©  is  associative:  ( n  ©  /;)  ©  c  =  a  ©  (6  ©  c)  for  all  a,  6,  c  €  S. 

•  0  is  an  identity  element  for  ©:  a  ©  0  =  a  for  all  a  £  S. 

2.  (5,  0,1)  is  a  monoid: 
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•  5  is  closed  under  a  3  b  £  S  for  all  a,  6  6  S. 

•  0  is  associative:  (a  0  6)  0  c  =  a  0  (6  0  c)  for  all  a, 6, c  G  S. 

•  1  is  an  identity  element  for  0:  a  0  1  =  a  for  all  a  6  5. 

3.  0  is  an  annihilator.  0  0  a  =  0  for  all  a  6  S. 

4.  0  is  commutative:  a  0  6  =  6  ©  a  for  all  a,  6  G  5. 

5.  ©  is  idempotent:  a  S  a  =  a  for  all  a  G  5. 

6.  0  distributes  over  ©:  a  ■.  (b  ~  c)  —  (a  0  6)  ©  (a  0  c)  for  all  a,b,c  £  S. 

7.  For  any  infinite,  countable  ^5e(|llence  ui, . . . ,  a,, . . .  the  sum  Cj  ©  . . .  ©  Oj  ©  . . . 
exists  and  is  unique,  .\ssociativity,  commutativity,  idempotence  applies  to  finite 
as  well  as  infinite  sums. 

8.  0  distributes  over  countably  infinite  sums. 


4.2  The  Closed  Semiring  Construction 

In  this  section  we  present  the  closed  semiring  construction  which  captures  unit-delay 
retiming  on  the  original  circuit  graph. 

We  define  the  set  S  as  follows: 

S  =  {(r,d)  :  /•  G  N  ,  d  G  {0, 1, . . .,  c  -  1}}  U  {cx),  oo). 

We  denote  the  additive  operation  by  MIN  and  define  it  as  follows: 

.MIN  (1-2. 1/2)  =  (min{ri,  r2}, /(di,d2  ;  ri,r2  )), 


where 


and 


mini?!.  r2} 


ri  if  ri  <  r2, 
Vi  if  ri  >  r2; 


{di  if  ri  <  r2, 

max{di,d2}  if  ri  =  7-2, 

dj  if  ri  >  r2. 


We  denote  the  multiplicative  o|)eration  by  0  and  define  it  aa  follows: 


(ri.di)X  (r-i^di)  =  (ri  +  r2  -  (dj  +  d2)div  c,di  +c<^2), 


40 


CHAPTER  4.  THE  CLOSED  SEMIRING  STRUCTURE  OF  RETIMING 


where 

{di  +  (I2)  <liv  c  — 

(rf[  +  d2)  mod  c  if  di  and  ^2  are  finite, 
cc  if  di  or  £^2  is  infinite; 

and  +  is  the  ordinary  addition  between  integers. 

def 

The  identity  element  for  the  additive  operation  is  0  =  (00,00). 

The  identity  element  for  the  multiplicative  operation  is  1  (0,0). 

Theorem  4.1  The  system  {S\  MIN,  1,0, 1)  is  a  closed  semiring. 

Proof:  We  prove  the  theorem  by  showing  that  the  system  satisfies  all  the  properties 
of  a  closed  semiring. 

1.  (5,  MIN,0)  is  a  monoid.  The  following  properties  hold: 

•  Closedness  under  MI.V.  Obvious. 

•  .A.ssociativity  of  0.  We  must  show  that 

((ri,d,)  MIN  (r2,d2))  MIN  (ra.ds)  =  (o.di)  MIN  ((r2,d2)  MIN  (r3,d3)). 

(4.1) 

The  left  hand  side  of  equation  (4.1)  can  be  rewritten  as 

(min{{ri.  r2},r.3}.  /(  I(di,d2  ;  ri,r2),d3;  min{ri,  r2},  r3)  ). 

The  right  hand  side  of  equation  (4.1)  can  be  rewritten  as 

(min{ri,{r2,r3}},  f(  di,  l(d2,d^  ;  r2,r3)  ;  n, min{r2, r3}) ). 

Since  mindri ,  r2}.  7-3}  =  min{ri,  {r2,  r3}},  the  first  coordinates  of  the  two  sides 
are  clearly  equal.  For  as.sociativity  to  hold,  it  remains  to  show  that 

/(/(di,d2;  ri,r2),(/3'.  min{r,,  r,},  r3)  = /(di, /(d2,<i3 >"2, ’"a);  ri,  min{r2,  r3}). 

(4.2) 

Applying  the  definitions  of  the  operations  to  both  sides  of  equation  (4.2)  we 
obtain  the  same  expression; 

d,  if  ri  <  rj,rk, 

<  max{d,,  dj)  if  r,  =  rj  <  rk, 
max{di,f/2,d3}  if  ri  =  r2  =  r^, 

for  distinct  i,j,k  6  {1.2. -I}.  Therefore,  MIN  is  associative. 


1  +  fl2 
c 
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•  0  is  an  identity  element.  We  have 

{r,d)  MIN  =  (oo,oo)  MIN  (r, d) 

=  (min{r, oo}, /(d, oo  ;  r,oo)) 

=  (r,d). 

2.  (5,0, 1)  is  a  monoid.  The  following  properties  hold: 

•  Closedness  under  O.  Obvious. 

•  Associativity  of  ,£■.  We  must  show  that 

((a,b)iz  (r.d))'l>  {e,f)  =  (a,b)Q  ((r,d)Q  (e,/)).  (4.3) 

If  one  of  the  pairs  equals  (oc,  oc)  the  relation  holds.  In  general,  now,  the  left 
hand  side  of  equation  (4.3)  equals 

(a  +  r  +  e  -  (6  +  (/)  div  c  -  (/  +  (6  +  d)  mod  c)  div  c  ,  (6  +  d  +  /)  mod  c) 
and  the  right  hand  side  of  equation  (4.3)  equals 

(a  +  r  +  e  -  (f  +  d)  div  c  -  +  (/  +  d)  mod  c)  div  c  ,  (6  +  d  +  /)  mod  c). 

In  order  to  prove  associativity  it  remains  to  show  that 

(6+d)divc  +  (/  +  (6  +  d)modc)divc  =  (/  +  d)divc +  (6+(/  +  d)modc)divc.  (4.4) 

The  left  hand  side  of  equation  (4.4)  can  be  written  as: 

(6  +  d)  div  c  +  (f  +  {b  +  d)  mod  c)  div  c 
_  b  +  d  ^  f  +  (b  +  d)  mod  c 

f  J  L  c 

b  +  d  -  (b  +  d)  mod  c  /  +  (6  +  d)  mod  c 

c  L  c 

b  +  d  -  (b  +  d)  mod  c  /  +  (6  +  d)  mod  c 
c  c 

b  +  d  +  f 
c 

Similarly,  the  right  hand  side  of  equation  (4.4)  can  be  shown  to  be  equal  to 
[(6  +  d  +  /)/cJ.  Therefore.  +  is  associative. 
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•  1  is  an  identity  element.  We  have 

(r,d)  0(0,0) 


(O,O)0(r,d) 

(r  +  0  —  +c  0) 

(r,d). 


3.  0  is  an  annihilator:  We  have 

(oo,  oc)  '0  (r,d) 


(r,d)0  (oo,oo) 
(oo  +  r,  oo  +c  d) 
(oo,oo). 


4.  MIN  is  commutative;  We  have 
(a,6)Mr.\  (/-.d) 


(min{a,r},/(6,d ;  a,r)) 
(min{r,a},/(d,6 ;  r,a)) 
(r,d)MIN  (c,b) 


since  it  is  clear  that  I{b,  d  {  n,  r )  =  I(d,b  r,a). 

5.  MIN  is  idempotent:  By  a  simple  application  of  the  definition 

(r,(/)  MIN  (r,f/)  =  (min{r,  r), /(d,  d ;  r,  r)) 

=  (r,d). 

6.  O  distributes  over  MIN:  We  must  show  that 

(0,6)0  ((r,d)MIN(c,/))  =  ((0.6)0  (r,d))MIN  ((o,  6)  ©  (e, /)).  (4.5) 

For  convenience,  let  (L\,L2]  =  (o,6)  i' ((''i^^)  {R\jH2)  —  ((®»^)© 

(r,d))MIN  ((o,6)0  (c./))-  -Applyins  the  definitions  we  have  that 

Li  =  miu{o  +  /•,  a  +  e}  —  /(6  +  d,  6  +  /  ;  r,e)  div  c, 

Z,2  =  (6  +  I{d.f  ;  r.f.))  mod  c  ); 


and 

Ri  =  min{o  +  r  -  (6  +  d)  div  c,  o  +  e  -  (6  +  /)  div  c}, 

R2  =  /( (6  + d)  mod  c,(6  + /)  mod  c  ;  o  +  r  -  (6 +  d)  div  c,  o  +  e  -  (6+ /)  div  c). 
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First  we  show  that  Li  =  Ri. 

Li  =  imn{a  +  r.a  +  e}  -  l{b  +  d,b  +  f r,e)  div  c 

=  inin{a  +  r,a  +  e}  -  I{(b  +  d)  div  c,  (6  +  /)  div  c  ;  r, e) 

a  +  r  -  (b  +  d)  div  c  if  r  <  e, 

=  <  a  +  r  -  max{(6  +  d)  div  c,  (6  +  /)  div  c}  if  r  =  e, 

a  +  e  -  {b  +  /)  div  c  if  r  >  e 

=  mln{a  +  r  -  (b  +  d)  div  c,  a  +  e  -  (6  +  /)  div  c} 

=  Ri. 

Now,  we  show  that  L2  —  R^- 

Li  =  {l^  +  I(d,f;  r, e))  mod  c 

=  (I(b  +  d.b  f  ;  r,€))  mod  c. 

Consider  the  following  three  possilrle  combinations  of  values  for  (6  +  d)  div  c  and 
(6  +  /)  div  c. 

Case  1:  (6  +  d)  div  c  =  {b  +  f)  div  c.  Then: 

Lj  =  /((6  +  d)  mod  c,  (b  +  /)  mod  c  ;  r, e) 

=  /((6  +  d)  mod  c,  (6  +  /)  mod  c  ;  a  +  r  -  (6  4-  d)  div  c,  o  +  e  —  (6  +  /)  div  c) 

=  R2. 

Case  2:  {b  +  d)  div  c  =  1  and  (6  +  /)  div  c  =  0.  In  this  case  f  <  d  and 

(6  +  d)  mod  c  <  (b  +  f)  mod  c,  since  (6  +  d)  mod  c  =  b  +  d-c  <  6  +  /  =  (6  +  /)  mod  c 

and  f,d  <  c.  Now,  consider  the  two  possible  relations  between  r  and  e. 

r  <  e:  In  this  case  a  +  r  -  (6  +  d)  div  c<n  +  e-(6  +  /)  div  c  and  consequently 

Li  =  Ri 

=  (b  +  d)  mod  c. 

r  >  e:  In  this  case 

Li  =  {b  +  f)  mod  c 

=  max{(6  +  d)  mod  c,  (6  +  /)  mod  c} 

=  /i-,. 
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Case  3:  {b  4  d)  Jiv  c  =  0  r,  <!  {L  -f-  /)  i'>i»  r  ^  1.  S>ii!metric  to  Case  2. 

From  Cases  1,  2,  and  3.  we  conclude  that  L2  =  ^2-  Therefore,  equality  (4.5)  holds, 
since  {Li,L2)  =  (i?i,  R2},  and  consequently  0  distributes  over  MIN. 

7.  MIN  gives  a  unique  result  when  operating  on  countably  infinit;,  sequences  of  argu¬ 
ments.  Also,  associativity,  commutativity  and  idempotence  applies  to  finite  as  well  as 
infinite  sums,  as  it  can  be  readily  seen  from  the  definitions  of  the  operations. 

8.  The  multiplicative  operation  0  distributes  over  countably  infinite  sums,  as  we  can 
easily  demonstrate  by  a  simple  induction. 

Items  1  through  8  demonstrate  the  correctness  of  the  theorem.  □ 

4.3  An  Algorithm  for  Unit-Delay  Circuitry  Retiming 

In  this  section  we  give  a  Uellman-Ford  type  algorithm  for  retiming  of  unit-delay  cir¬ 
cuitry,  which  operates  on  the  original  circuit  graph.  Specifically,  given  a  unit-delay 
circuit  G  =  {V,  E,  1,  w)  and  a  positive  integer  c,  we  determine  a  retiming  r  of  G  such 
that  the  clock-period  <&(6'r)  of  the  retimed  circuit  Gr  satisfies  ^(Gr)  <  c.  Our  algo¬ 
rithm  terminates  in  0(VE)  steps  and  it  matches  the  best  previously  known  strongly 
polynomial  algorithm  for  the  same  problem  [17],  which,  according  to  Theorem  3.1  is 
obtained  by  running  Bellman- Ford  on  O'  -  1/c. 

For  the  reader’s  convenience,  we  give  here,  without  proof  of  correctness,  the  BeUmam 
Ford  algorithm  on  G  -  1/c  for  miit-delay  circuitry  retiming. 

Algorithm  BF  This  algorithm,  given  a  unit-delay  circuit  graph  G  =  (V,  E,  1,  w)  and 
an  upper  bound  c  for  the  clock-period,  determines  a  function  :  V'  — ►  R,  such  that 
the  retimed  circuit  G|-pi  satisfies  the  clock-period  constraint  $(G'|-pi)  <  c. 

1.  For  some  verte.x  s  6  V  set  p{s)  =  0.  For  all  vertices  u  in  F  —  {s}  set  p{v)  =  00. 

2.  Repeat  V  -  1  times: 

For  each  edge  u  ~  v  £  E  set  f>(  v)  =  min{/9(u),p(u)  -t-  u;(e)  —  1/c}. 

3.  For  each  edge  u  ~  v  £  E  set  (iv(t )  =  w(e}  -i-  [p(u)]  -  rp(v)].  □ 

In  our  algorithm,  the  additive  and  multiplicative  operations  utilized  are  the  MIN 
and  G  operations  introduced  in  the  previous  section.  The  elements  of  the  set  S  of  the 
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semiring  a’--"*  i^^els  h{v)  =  (?•((,’),  f/(L'))  associated  with  each  vertex  v  of  the  graph.  The 
algorithm  proceeds  as  followr: 

Algorithm  R  This  algorithm,  given  a  unit-delay  circuit  graph  G  =  {V,E,l.,w)  and 
an  upper  bound  c  for  the  clock-period,  determines  a  retiming  r,  such  that  the  retimed 
circuit  satisifes  the  clock  period  constraint  ^{Gr)  <  c. 

1.  For  some  vertex  s  £  V'  set  li(s)  =  (0,0).  For  all  vertices  v  'm  V  —  {s}  set 
h{v)  —  (oo,  oo). 

2.  Repeat  -  1  times: 

For  each  edge  u  ~  v  £  E  set  /;(  v)  —  Ml^(h{v),  h{u)  0  (te(e),  1)). 

3.  For  each  edge  a  v  £  F.  set  =  w’(^)  +  r{u)  -  r{v).  □ 

The  correctness  of  .-Mgorillirn  R  is  ensured  by  the  following  lemma,  which  shows 
that  the  operation  of  .Algoritiiin  R  on  G  simulates  the  operation  of  Algorithm  BF  on 
G-l/c. 


Lemma  4.1  Let  G  -  ( K  E,  i,  w)  be  a  unit-delay  circuit  graph.  Let  p{v)  be  the  vari¬ 
ables  of  Bellman-Ford  on  G  -  1/c  for  each  vertex  v  £  V.  Moreover,  let  (r(i;),  d(i;)) 
be  the  variables  of  Algorithm  R  on  G  for  each  vertex  v  £  V .  If  both  Algorithm  BF 
and  Algorithm  R  relax  edges  in  the  .same  order,  then  after  each  relaxation  of  an  edge 
u  V  £  E  we  have  p(v)  =  r(  r)  -  d(  t’)/c  for  every  vertex  v  £  V . 

Proof:  The  proof  is  by  iudiicliou  on  the  relaxations.  Let  p^{v)  and  (r*’(t;),d*’(u))  denote 
the  values  of  the  variables  of  .Algorithms  BF  and  R  respectively  before  the  relaxation  of 
an  edge  x  y.  Similarly,  let  /)  '{  v)  and  ( r'-'(  r?),  d°{  v))  denote  the  values  of  the  variables 
after  the  relaxation  of  an  edge  x  —  y.  We  shall  show  that  if  /9*’(u)  =  r^{v)  -  d'’{v)/c 
for  every  vertex  v  £  V  and  both  algorithms  relax  the  same  edge  x  y  £  E,  then 
p“(u)  =  r'^{v)  —  d‘^{v)fc  for  every  vertex  v  £  V  after  the  relaxation. 

Initially,  before  any  relaxation  is  perforiTied,  the  statement  holds,  assuming  oo  —  oo/c 
for  every  verte.x  v  £  V  —  js}  ami  since  0  —  0/c  =  0  for  v  =  s.  Now,  let  i  — ♦  y  be  the 
edge  to  be  relaxed.  Then 


00 


{r'^(v).d'‘(v))  =  {r\v),d\v)) 
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foi  every  vertex  v  y,  and 

r'^(y)  =  min|r'’(yJ,r*’(a-)+ u;(e)-  [(d*’(x)  + l)/cj}, 

d'^iy)  =  /  +  1)  mod  c  ;  r*{2/),/(i)  +  u;(e)  -  [(/(i)  +  l)/cj)  . 

We  consider  the  following  throe  cases,  based  on  whether  r^{y)  is  smaller  than,  greater 
than  or  equeJ  to  r^(x}  +  a)(e)  -  j^(d*’(x)  +  l)/cj. 

1.  r*’(y)  <  r^(x)  +  u;(e)  -  |^(d'’{x)  +  l)/cj .  In  this  case  the  relaxation  of  edge  x  y 
by  Algorithm  R  yields 

(C(y).(E(y))  =  ir\y),d\y)).  (4.6) 

Now,  we  want  to  find  p'^{//)  due  to  the  relaxation  of  edge  x  y  by  Algorithm 
BF.  Since  r^(v)  €  Z  for  all  vertices  e  €  V,  the  inequality  r^(y)  <  r*(x)  +  w(e)  — 
(d*’(x)  +  l)/cj  implies  that: 

'■*(.!/)  <  r''(x)  +  -  [(d^(x)  +  l)/cj  -  1 

<  r'’(x)  +  w(e}  -  {d\x)  +  l)/c 

=  p\x)  +  w{e)  -  l/c, 

given  that  1  +  |^(d*’(x)  +  1  )/cJ  >  ((/*(x)  +  l)/c,  for  d*’(x)  £  {0, . .  .,c  -  1},  and 
that  r*’(x)  -  d*’(x)/c  =  p^(x)  by  the  inductive  assumption.  We  also  have  that 

/'*’(.'/)  =  r^(y)  -  d^(y)/c 


Therefore  p^{y)  <  />'^(.c)  4-  "’(c)  -  l/c  and  the  relaxation  of  edge  x 
Algorithm  BF  yields 

/>"(</)  =  p^y)- 


y  by 
(4.7) 


From  equations  (4.6)  and  (4.7)  and  the  inductive  assumption  we  have  that 

p“(y)  =  '-“(y)  -  d^(y]lc. 

2.  r^{y)  >  r*’(x)  +  (e(c)  -  [(d'^(x)  +  1  )/cJ .  In  this  case  the  relaxation  of  edge  x  y 
by  Algorithm  R  yields 


(r“(y),d“(y))  =  (r^(x)  +  ir(e)  -  (d^(x)  +  l)/c 


(<i*’(x)  +  1)  mod  . 


(4.8) 
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No’.v,  v.'c  v/ant  tc  Sad  )  du.;  to  tl'.  ‘  iolajiatiuii  of  edge  x  j/  by  Algorithm 

BF.  Since  r^(v)  €  Z  and  d^{v)/c  <  1  for  all  vertices  v  £  V,  the  inequality 
r^{y)  >  r*’(x)  +  w{e)  -  |^(d*'(x)  +  l)/cj  implies  that: 

p\y)  =  r\y)-  d\y)lc 

>  /•*’(x)  +  tt’(e)  -  j^(d*’(x)  +  l)/c  -d!’{v)lc 

>  r”(x)  +  ic(e)  —  (d^{x)  A  1)1  c  —  d^{v)/c 

-  p^{x)  +  w(e)  -  l/c. 

Therefore  the  relaxation  of  edge  .v  —  y  by  Algorithm  BF  yields 

P^iy)  =  P^x)  +  w{e)  -  I/c.  (4.9) 

From  equations  (4.8)  and  (1.9)  and  the  inductive  assumption  we  have  that 
r“(y)  -  d^{y)/c  =  r\.t)  +  w{e)  -  [(d^x)  +  l)/cj  -  ((d*’(x)  +  1)  mod  c)  /c 

=  p^(.r)  +  d\.v)/c  +  w(e)  -  [((/^(x)  +  l)/cj  -  ((d*’(x)  +  1)  mod  cj  jc 
-  p'[x)  T-  (</''(.£)  +  l)/c  -  [(d'’(i)  +  l)/cj  -  ((d'’(i)  4-  1)  mod  /c 

=  P'‘(.r). 

since 

{d'^ix)  +  l)/c  =  [^(d^-c)  +  l)/cj  +  ((d^(x)  +  1)  mod  c)  /c  (4.10) 

for  d*(x)  e  {0, ....  c  -  1 }.  as  it  can  be  easily  verified  by  checking  the  cases  for 
d*’(x)  =  c  -  1  and  d^(x)  <  c  -  1. 

3.  r^{y)  =  r*(x)  +  w(e)  -  j^(d''(.i;)  +  I  )/cJ .  In  this  case,  the  relaxation  of  edge  x  y 
by  Algorithm  R  yields 

('’“(!/),d'"((/))  =  (^r(y).nin\^d\y),{d\x)+  1)  mod  c|)  .  (4.11) 

Now,  we  want  to  find  />“(?/)  due  to  the  relaxation  of  edge  x  y  by  Algorithm  BF. 

By  the  inductive  assumption  and  equation  (4.10)  the  equality  r*’(y)  =  r*’(x)  + 
iu{e)  —  j^(d^(x)+  l)/cj  implies  that 

p'’(y)  =  p^(x)  +  ic(e)  +  (d'’(.r)  -  d^yD/c-  (d^x)  +  l)/cj 

=  p\x)  +  nit)  J-  (d■(■r)  -  d^y))/c  -  (d'’(x)  +  l)/c  +  ((d*’(x)  +  1)  mod  c)  jc 
=  p^(x)  +  (c(f )  -  1/c  +  ((d^(x)  F  1)  mod  c  -  d*’(y)^  jc. 
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We  consid'^r  t'”o  cases,  depending  on  the  ordering  of  {fi^(x)  +  1)  mod  c  and  d^{y). 

Case  A:  +  1)  mod  c  -  d'^(y)  >  0.  In  this  case  >  P^{x)  +  w(e)  -  1/c 

and  Algorithm  BF  yields 

p'^iy)  =  p\x)  + w(e)  -  1/c.  (4.12) 


From  equations  (4.11),  (4.12),  and  (4.10),  and  the  inductive  assumption  we  have 
that 


r“()/)  -  tr(v/)/c  =  r^ix)+  ir(e)  -  [(/(x)  +  l)/cj  -  ((d'’(x)  +  1)  mod  c)  /c 

=  //(.r)  + /(.rj/c  +  te(e)  -  [(t/'’(x)  +  l)/c  -  ((/(x)  +  1)  mod  c) /c 
=  p^(.r)  +  /(.c)/c+  w{€)  -  (f/(i)  +  l)/c 

=  p^x)  +  (/'(e)  -  1/c 
=  p^l-r). 


Case  D:  (d^(x)  +  1)  mod  c  -  d'^iy)  <  0.  In  this  case  Algorithm  BF  yields 

p‘{y)  =  pHy)-  (4.13) 

From  equations  (4.11)  and  (4.1.'J)  and  the  inductive  assumption  we  have  that 
r‘{y)-d'‘(,j)/c  =  r\y)-d\y)/c 

=  p\y) 

=  P“(i/). 

Therefore,  we  still  have  /  '('/)  “  <T(y)/c  —  p'^(y). 

From  cases  1,  2  and  'J  we  conclnde  that  if  both  algorithms  rela.x  edges  in  the  same 
order  then  p(v)  =  r{v)  —  d{  r)/c  for  every  c  €  1’  after  each  rela.xation.  □ 

Now,  the  following  theorem  shows  that  tlie  set  of  values  r(w)  computed  by  Algo¬ 
rithm  R  yields  a  legal  retiming  of  the  unit-delay  circuit  G. 

Theorem  4.2  Let  G  —  ( I  .  £ ,  1 ,  u’)  he  a  unit-delay  circuit  graph  and  let  c  be  a  positive 
integer.  Also,  let  ( r(n),  c/(  e))  he  the  variables  of  Algorithm  R  on  G  for  each  vertex 
V  ^  V .  Then,  after  the  ttnuination  of  Algorithm  R,  r(i;)  yields  a  retiming  of  G  such 
that  ^(Gr)  <  c. 
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P^oof:  Let  be  the  vaiiables  ul'  Bellman- Ford  on  G  —  1/c  for  each  vertex  v  £  V. 
From  Theorem  4.1  and  the  fact-s  that  r{v)  £  Z  and  d{v)  <  c  for  every  v  £  V ,  we  have 
that  [p(u)]  =  r{v).  This  equality  and  the  correctness  of  Algorithm  BF  imply  that 
r{v)  yields  a  legal  retiming  of  G',  such  that  ^(Gr)  <  c.  □ 

In  summary,  this  chapter  exhibits  the  closed  semiring  structure  of  retiming  on 
a  unit-delay  circuit  G  and  demonstrates  a  Bellman- Ford  type  algorithm,  which  uses 
the  additive  and  multiplicative  operations  of  the  semiring  in  order  to  compute  a  legal 
retiming  of  the  circuit.  The  algorithm  operates  only  on  the  original  graph  G  and  its 
running  time  matches  that  of  tlie  best  jirevinnsly  known  strongly  polynomial  algorithm 
for  the  same  problem. 


Chapter  5 


A  Mixed-Integer  Optimization 
Problem 


This  chapter  investigates  a  mixed-integer  optimization  problem  which  arises  in  the 
mixed-integer  optimization  finmework  of  retiming,  as  it  was  introduced  in  [17].  We 
present  a  polynomial  time  algorithm  for  the  problem,  that  is  based  on  the  technique 
of  introducing  additional  constraints,  known  as  cuts,  in  such  a  way  that  the  integrality 
constraints  of  the  mixed-integer  prol)lcm  are  met  by  the  optimum  solution  of  its  linear 
programming  relaxation. 

The  five  sections  of  the  chapter  are  organized  as  follows.  Section  5.1  reviews  the 
problem  of  finding  a  minimum-cost  flow  on  a  network.  It  also  presents  the  dual  prob¬ 
lem  of  a  minimum-cost  flow  and  gives  optimality  conditions  which  relate  primal  and 
dual  solutions.  Section  5.2  introduces  the  mixed-integer  optimization  problem  that  we 
solve  in  this  chapter.  The  |)rol)lem  is  identified  as  the  restricted  ctise  of  a  mixed-integer 
dual  of  an  uncapacitated  ininiiniim-<  jlou\  because  the  relaxation  of  its  integrality 
constraints  reduces  it  to  the  tinai  of  an  uncapacitated  minimum-cost  flow  problem. 
Based  on  this  observation,  we  develop  feasibility  and  optimality  conditions  for  the 
mixed-integer  problem  in  Section  5.  I.  Section  5.4  describes  an  algorithm  that  solves 
the  mi.xed-integer  problem  in  Ofl  Mg!  )  steps.  Finally,  Section  5.5  gives  an  applica¬ 
tion  of  our  algorithm  i)y  reducing  the  problem  of  state  minimization  of  synchronous 
circuitry  to  the  mixed-integer  prolilem  that  we  solve  in  this  chapter. 


5.1.  PRELIMIN.ARIES 
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5.1  Preliminaries 

In  this  section  we  give  some  basic  l)ackgroiiiid  material  on  the  problem  of  finding  a 
minimum-cost  flow  in  a  network. 

A  flow  network  G  —  ( V  ,  E,  tv.c)  is  an  edge-weighted  directed  graph  in  which  each 
edge  u  V  ^  E  has  a  weight  w[e)  and  capacity  c(e)  >  0.  Let  each  vertex  v  £  V  have 
an  associated  real  value  b(v)  such  that  in  G  is  a  real-valued 

function  f  :  E  ^  K  that  satisfies  the  following  two  properties: 

E  /(")-  E  =  (5.1) 

- - -  c/'-  U^l^E 

for  all  vertices  u  €  T.  and 

()</(c)<c(e)  (5.2) 

for  ail  edges  u  ~  v  €.  E. 

A  flow  network  G  =  (WE.w.c)  with  c(e)  =  oc  for  aU  edges  e  £  E  \s  called 
uncapacitated.  For  simplicity,  in  the  rest  of  this  paper  we  shall  denote  an  uncapaci¬ 
tated  network  by  G  =  ( 1',  E.  w).  Tiie  problem  of  finding  a  minimum-cost  flow  on  an 
uncapacitated  network  G  =  {W  E.  w)  is  defined  as  follows. 

Problem  UMC-Flow  ( Cnvaptuildtcd  Minimum- Cost  Flow)  Let  G  =  {V,E,w)  be 
an  uncapacitated  flow  network.  Let  each  vertex  v  £  V  have  an  associated  real  value 
b{v)  such  that  Ylvev  ~  minimum-cost  flow  f  on  G  Is  3,  flow  that  minimizes 

U-Ev&E 

The  linear  programming  dual  of  Problem  UMC-Flow  is  defined  as  follows. 

Problem  DUMC-Flow  (Diml  I'm  aixicitnlcd  .Minimum-Cost  Flow)  Let  G  =  (V,E,w) 
be  an  uncapacitated  flow  network.  Let  each  vertex  v  G  V  have  an  associated  real  value 
b(v)  such  that  ^G’)  =  0-  l)ei(>rmine  a  value  x-(u)  for  each  vertex  v  £  V  that  max¬ 

imizes  ^  x{v)b(v)  subject  to 
vev 

.r(v)  —  x(u)  <  w{€)  (5.3) 


for  all  edges  a  —  v  £  E . 


□ 
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Note  that  there  are  no  integrality  constraints  on  the  solutions  x.  The  mixed-integer 
problem  that  we  solve,  and  that  wo  present  in  the  following  section,  has  the  form  of 
Problem  DUMC-Flow  with  the  addition  of  integrality  constraints  on  a  subset  of  the 
variables  in  x. 

The  following  theorem  is  a  direct  consequence  of  the  primal-dual  relation  of  Prob¬ 
lems  UMC-Flow  and  DUMC-Flow. 

Theorem  5.1  Let  /’  be  a  flow  Hint  t^olves  Problem  UMC-Flow  and  let  Zp{f*)  = 
^  w{e)f"'{e).  Siniilarhj.  Id  .r'  be  a  Jlow  that  solves  Problem  DUMC-Flow  and 

let  Z,i{x')  =  Y.v^v  C(v)b{r).  IIhh  Zp{f’)  =  Zd[x').  □ 

Almost  all  algorithms  for  Piol>lems  UMC-Flow  and  DUMC-Flow  rely  on  Theo¬ 
rem  5.1  and  they  usually  yield  a  solution  for  both  problems  at  the  same  time.  A  basic 
concept  used  in  th  se  algorithms  is  that  of  the  residual  network  G{f).  The  residual 
network  G{/)  corresponding  to  a  flow  /  is  defined  as  follows;  we  replace  each  edge 
u  —>■  u  E  E  by  two  edges  a  —  r  and  r  —  ti.  The  edge  u  v  has  cost  uj(e)  and  a 
residual  capacity  r(e)  =  n(e)  -  /(c),  and  the  edge  v  u  has  cost  — u;(e)  and  residual 
capacity  r(e')  =  /(e).  The  residual  network  consists  only  of  arcs  with  positive  residuaJ 
capacity. 

The  following  theorem  gives  a  necessary  and  sufficient  solution  for  a  flow  /  to  be 
optimum  in  terms  of  the  residual  network  G(f). 

Theorem  5.2  Let  C  =  (  \ E .  w.  a )  bi  a  jlow  netirorlc.  Then  a  flow  f  on  G  is  optimum 
if  and  only  if  G{f)  contains  no  n(  fjalire-wf  iyht  directed  cycles.  □ 

Finally,  the  following  l.emtiia  [2]  demonstrates  how  to  obtain  an  optimum  solution 
X  for  Problem  DUMC-Flow  once  an  optimum  flow  /  for  Problem  UMC-Flow  is  known. 

Lemma  5.3  Let  G  =  ( V',  A',  w.  «)  bi  a  Jlow  network  and  let  /*  be  an  optimum  flow  on 
G.  Moreover,  let  l{v)  denote  the  h  nyth  ejj  the  shortest-path  in  G{f)  from  some  source 
s  ^  V  to  vertex  v  £  U.  Then  the  ns.-iynmtnt  x{v)  =  l{v)  for  every  vertex  v  £  V  is  an 
optimum  solution  for  Prolelein  DUMC-Flow.  □ 


5.2.  MIXED-INTEGER  DU.\L  .MINIMUM-COST  FLOW 
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5.2  Mixed-Integer  Dual  Minimum-Cost  Flow 

In  this  section  we  present  the  niixed-intcger  optimization  problem  that  we  solve  in  this 
chapter.  VVe  refer  to  the  prol)l<'in  as  the  restricted  mixed-integer  dual  of  uncapacitated 
minimum-cost  flow  and  we  identiry  it  as  a  special  case  of  a  general  mixed-integer 
optimization  problem. 

The  restricted  mi.xed-iateger  dual  of  uncapacitated  minimum-cost  flow  is  defined 
as  follows. 

Problem  RMI-Dual-Flow  ( Restricted  .Mixed- Integer  Dual  of  Uncapacitated  Minimum- 
Cost  Flow)  Given  an  uncapaciiatcd  ilow  network  G  =  (V,  E ,  tn)  with  w(e)  €  R,  a  set  V/ 
such  that  V/  C  V",  and  an  intoger  6(r)  for  each  vertex  v  £  Vi  such  that  =  0 

and  b{v)  =  0  for  all  v  ^  I'/,  lind  a  value  for  each  vertex  v  that  maximizes 
Z{x)  =  sui)ject  to 

.r(v)  -  x(u)  <  w{e)  (5.4) 

for  every  edge  u  u  €  A.  and 

.r(  i- )  €  Z  (5.5) 

for  every  u  €  V’/.  □ 

Observe  that  the  maximization  of  the  sum  is  performed  over  the  subset  V/  of  V,  which 
is  required  to  take  on  integer  valnc>..  I'he  reason  that  we  identify  the  problem  as  the 
mixed-integer  dual  of  an  uncapacitated  minimum-cost  flow  is  that  if  we  relax  the  inte¬ 
grality  constraints  (5.5)  it  reduces  to  Problem  DUMC-Flow,  the  linear  programming 
dual  of  an  uncapacitated  minimum-cost  (low  problem  [12,  21).  Based  on  this  observa¬ 
tion,  we  describe  in  Section  5.1  an  (.>(r-*lg  I')  time  procedure,  which  solves  Problem 
RMI-DualFlow. 

We  can  generalize  Problem  H .M I-l)nal-Flow  by  extending  the  set  over  which  the 
ma.ximization  is  performed  to  include  tlie  entire  vertex  set  V  of  the  graph. 

Problem  MI-Dual-Flow  ( .Mixed-integer  Dual  of  Uncapacitated  Minimum-Cost  Flow) 
Given  an  uncapacitated  network  (I  =  (V'.  £',je),  with  w[e)  €  R,  a  set  V/  such  that 
C  V,  and  an  integer  b[  r)  for  eacli  vertex  v  £  V  such  that  E^evM^)  =  0>  a 
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value  x(u)  for  each  vertex  e  €  V’  that  maximizes  x{v)b{v)  subject  to 

x(  t')  —  x(  tt)  <  it'(e) 

for  every  edge  u  u  €  Z-,  and 

-i(  t>)  e  Z 

for  every  v  G  V).  □ 

VVe  conjecture  that,  contrary  to  Problem  RMI-Dual-’^low,  Problem  MI-Dual-Flow 
is  not  tractable. 

Conjecture.  Problem  MI-!)ii:tl-Elow  is  NP-CompIcte.  □ 

Two  facts  support  our  con  ji'ci  me.  Fir.st,  the  feasible  vectors  of  Problem  RMI-Dual- 
Flow  do  not  form  a  convex  .-^ei .  due  to  tlie  integrality  constraints.  Lack  of  convexity 
rules  out  linear  programming  approaches  that  lead  to  polynomial  time  algorithms  [21, 
2].  In  addition,  the  solutions  to  Problem  .\II-Dual-Flow  do  not  necessarily  exhibit  the 
optimal  substructure  pioperly.  I'liere  exist  instances  of  Problem  MTDual-Flow  which 
in  order  to  be  solved  reriuire  a  locally  suboptimal  assignment  of  values  to  the  unknowns. 
The  lack  of  optimal  substructure  rules  out  dynamic  programming  approaches  that 
could  lead  to  polynomial  time  algorithms  [l], 

5.3  Feasibility  and  Optimality  Conditions 

In  this  section  we  develop  feasibility  .uid  ojrtimality  conditions  for  Problem  RMI-Dual- 
Flow.  Specifically,  we  cunslrucl  an  .uixiliary  problem  by  augmenting  the  constraint-set 
of  Problem  R.\II-Dual-Flow  with  new  constraints,  which  are  derived  from  the  given 
constraint-set.  The  auxiliary  problem  Ims  no  explicit  integrality  constraints  and  we 
prove  that  it  is  feasible  if  and  only  if  Problem  R.M  1- Dual- Flow  is  feasible.  Finally,  we 
prove  that  a  solution  of  t  he  auxiliar\  [)roblem  solves  Problem  RMI-Dual-Flow  as  well. 

First,  let  us  describe  how  the  additional  constraints  are  obtained.  Let  G  = 
(V,E,w)  be  an  edge-weighted  gr.iph  and  let  1/  C  F.  We  define  the  short-cut  graph 
Gs  —  Es,  (C’i-)  as  follows. 

L'.s  =  j  <1  e  '•  u,  n  €  V{,  u  u  €  G}, 

ii'si  ii  e)  =  m;n{n'(p)  :  u v£  G). 


5.3.  FEASIBILITY  A.\D  OTTl.MALITY  CONDITIOSS 
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We  also  define  the  dense  (jnt]jh  Co  —  E  U  Es,wd)  with  edge- weights  defined  as 
follows. 


wo(u  —  o) 


ii'(e)  if  e  e  E, 

[u;5(e)J  if  e  e  Es- 


The  edges  in  Es  impose  the  additional  constraints  of  Problem  AUX.  We  define  the 
auxiliary  problem  AUX  in  terms  of  the  original  graph  G  and  its  corresponding  short¬ 
cut  graph  Gs- 


Problem  AUX  (.Auxilinrij  Dual  of  Uncapacitated  Minimum-Cost  Flow)  Let  G  = 
{V,E,w)  with  w{e)  €  R,  be  an  edge-weighted  graph  and  let  Gs  =  (Vs,  Es,ws)  be  its 
corresponding  short-ciit  grapli.  Given  a  set  Vi  such  that  U/  C  V,  and  an  integer  b(v) 
for  each  vertex  v  €  17  such  that  71, .ev'/  ~  ^  =  0  for  all  u  ^  Vj,  find  a 

value  x{v)  for  each  vortex  c  t  U  lliat  maximizes  Z{x)  =  TZvgV/  x(v)b{v)  subject  to 

./■(  r)  -  ,r(  a)  <  u’(e)  (5.6) 

for  every  edge  u  —  e  €  E.  ami 


,c(f)  -  ,i(«)  <  [u;s(e)J  (5.7) 

for  every  edge  u  —  v  £  Es. 

First,  we  shall  prove  that  feasilrility  of  RMI-Dual-Flow  implies  feasibility  of  AUX 
by  showing  that  the  set  of  solutions  of  RMI-Dual-Flow  encompasses  all  solutions  of 
AUX.  We  denote  by  the  sot  of  feasible  vectors  for  R.M I- Dual- Flow  and  by 

the  set  of  optimum  vectors  for  K.MI-Dual-Flow.  Similarly  for  AUX,  we  denote  its  set 
of  feasible  vectors  by  .V-iC'.v  uml  its  set  of  optimum  vectors  by 

Lemma  5.4  If  x  €  then  .r  G  .V-u  v- 

Proof:  Let  x  =  (x(l) . r(|r|))  l)c  any  vector  in  Then,  from  inequality  (5.4) 

we  have  that  j(e)  -  .v(n)  <  /r(f  )  for  every  edge  u  E*  v  E  E.  Therefore,  x  satisfies 
inequality  (5.6). 

Also,  for  every  u,  o  G  U/  let  p  =  it  ^  r  be  the  shortest  path  in  G  from  u  to  v. 
By  applying  inequality  (5.1)  along  p  and  the  definition  of  the  short-cut  graph  Gs  we 
have  that  x{v)  -  x(u)  <  ws(f- )  for  u  —  v  ^  Es.  Since  x(u)  and  x(v)  are  integers  from 
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constraint  (5.5),  we  can  write  x{i.’)  -  x(u)  <  [u;s(e)J  for  u  v  6  Es-  Therefore,  x 
satisifies  inequality  (5.7)  as  well  and  consequently  x  6  Xaux- 

As  an  immediate  consequence  we  have  the  following. 

Corollary  5.5  For  any  x  €  A'/),//  with  Z(x)  =  x{v)b{v)  and  for  any  y  6  Af^ux 

with  Z{y)  =  Ev€V’/  i/(^')i!'(t’).  «•<?  have 

Z(x)  <  Z{y). 

Proof:  Since  x  £  A’p^,,  we  also  have  x  G  A’rmi.  From  Lemma  5.4  we  infer  that 
X  €  A’aux  therefore  Z(x)  <  Z(y)  for  every  y  G  ^ 

Now,  we  shall  prove  that  feasibility  of  Problem  .4UX  implies  feasibility  of  Problem 
RMI-Dual-Flow: 

Lemma  3.6  If  .Xaux  ^  ®  Ihen  A)i,\fj  ^  0. 

Proof:  From  Lemma  2.3  and  tlu*  definition  of  Gs  and  Go  we  have  that  Xaux  ^  0 
exactly  when  Gs  is  well-defined  aiui  there  exists  no  negative-weight  cycle  in  Go-  Let 
x(v)  be  the  lengtli  of  the  sin  v.st  path  in  Go  from  some  source  s  G  V"  to  the  vertex 
V  G  V ■  Then,  from  Lemma  2.2.  x  satisfies 

•M  c)  -  .!•(  (()  <  (e(e) 


for  every  u  —  u  G  E,  and 

./■(  r)  -  .c(  u)  <  [«’5(e)J 

for  every  edge  u  u  G  Es-  rherefure.  x  satisfies  inequality  (5.4). 

Moreover,  since  for  any  path  p  =  u  v  in  E  with  u,v  £  Vi  there  always  exists  an 
edge  u  V  £  Eq  with  wo{c}  <  ir{i>)  and  wo{e)  £  Z,  the  shortest  path  in  Eo  from 
the  source  s  to  any  vertex  v  £  I'/  will  be  on  integer-weight  edges  only,  provided  s  £  V/. 
Thus,  by  setting  x{s)  =  0  we  can  ensiue  x(v}  £  Z  for  all  vertices  v  £  V/.  Therefore,  x 
satisfies  inequality  (5.5)  as  well  and  con.sequently  x  £  XfiMl-  Q 

As  a  consequence  of  Lemmata  5.4  and  5.6,  we  have  the  following  corollary. 


5.3.  FEASIBILITY  AND  OPTI.MALITY  CONDITIONS 
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Corollary  5.7  Problem  RMl-Dnul-Flow  is  feasible  if  and  only  if  the  short-cut  graph 
Gs  is  well-defined  and  the  dem^e  yraijh  Go  i^(^s  no  negative-weight  directed  cycles.  □ 

In  the  remaining  of  this  section  we  show  how  to  obtain  a  solution  of  Problem  AUX 
that  solves  Problem  RMI- Dual- Flow  as  well. 

First,  we  shall  show  that  there  exists  a  primal  solution  of  Problem  AUX  which  has 
a  special  structure.  Then  we  demonstrate  how  we  can  exploit  this  special  structure 
in  order  to  find  a  solution  for  Problem  R.MI-Dual-Flow.  Recall  that,  according  to 
section  5.1,  the  primal  of  Prolrlem  is  an  uncapacitated  minimum-cost  flow  on 

Gd  =  [yD^Eo^wo). 

Lemma  5.8  Let  f  be  a  floir  on  Gp  that  solves  the  primal  of  Problem  AUX.  Also,  let 
E^if)  =  {u^ve  Ed  '■  /(« )  >  0}.  Then  there  exists  a  flow  f  on  Go  that  solves 
the  primal  of  Problem  .A  L’.\  such  that 

EMD  Q  Es. 

Proof:  Consider  an  optimum  /  on  Go  with  /(e)  >  0  for  some  edge  u  v  ^  Es>  We 
show  that  by  rerouting  flow  we  can  always  convert  /  to  a  new  flow  /'  such  that  Zp{f)  = 
Zp{f')  and  E^if')  C  Es-  Since  u  e  ^  £5  there  exists  a  path  pi  =  uq  ^  ^ 

...Uk-i  u  with  (to  €  \'i.  ill, - Uk-i  i  V[,  and  /(e,)  >  0  for  i  =  0,1,... A:  -  1, 

and  a  path  p2  =  v  —  t’;_i  — ...t/i  —  cq  with  t’o  G  Vj,  ^  Vp  and 

/(e,)  >  0  for  i  =  0,  1, . . .  /  -  1.  .Note  that  as  long  as  there  exists  an  edge  u  u  ^ 

with  /(e)  >  0,  we  can  always  find  [)atiis  pi  and  p2  constructed  in  the  way  above.  If 

there  were  no  such  paths,  tlieu  (lie  node-l)alance  constraints  (5.1)  would  have  been 
violated,  since  6(n)  =  0  for  every  r  ^  \  '/. 

Now,  since  uq,  i’o  €  V/,  and  /  is  optimum,  there  e.xists  an  edge  uq  ^  vq  6  Es  with 
=  WD(pi',e:p2),  where  p\-.e:p2  denotes  the  path  formed  by  concatenating pi,e, 
and  P2.  Therefore,  we  can  reroute  min{/(e,)  :  e,  €  P\\e,p2)  units  of  flow  through  e/ 
and  still  maintain  an  optimum  (low.  Lot  /a  be  the  new  optimum  flow.  Then 

\Ef2[fa)\<\Et2U)\-y 


Therefore,  repetition  of  this  procedure  until  Ep{fa)  fl  E  =  0  yields  an  optimum  flow 
/'  such  that  Ep{f')  C  Es-  □ 
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Now,  we  show  liow  we  can  a  solution  for  Problem  AUX  that  satisfies  the 
integrality  constraints  of  Problem  RMI-Dual-Flow.  The  proof  relies  on  Lemma  5.8 
above  and  on  Lemma  5.3  of  Section  5.1. 

Lemma  5.9  Let  f  be  a  solution  for  the  primal  of  Problem  AUX  with  Ep{f)  C  Es- 
Then  there  exists  x  6  *  ^’aux  such  that: 

x(  v)  e  Z 

for  all  V  £  Vj. 

Proof:  Let  dD{f,  v)  denote  the  length  of  a  shortest-path  in  the  residual  graph  Goif) 
from  a  source  s  €  Vi  to  a  vertex  r  6  I'.  Prom  Lemma  5.3  we  know  that  once  an 
optimum  flow  /  for  the  primal  of  Problem  AUX  is  known,  the  assignment  i(i;)  = 
d[){f,  v)  for  every  vertex  e  €  1  yields  a  solution  x  to  Problem  AUX. 

It  remains  to  show  that  x  .-.atislies  x(v)  €  Z  for  all  v  6  Vj.  Let  us  denote  by 

the  length  of  a  path  s  in  Goif)  and  let  p  be  a  shortest-path  in  Goif)  from 
the  source  s  €  1/  to  a  vertex  r  £  I),  \^'e  .shall  prove  that  //?(/>/>)  =  do{f,v)  6  Z. 
Let  q  =  vq  ~  v\  —  '  Vk-\  t’A-  he  a  part  of  p  such  that  VQ,Vk  6  Vj 

and  ^  1/.  Since  Ep{f)  C  Es,  we  have  that  either  e,  £  E  \J  Es  for 

all  edges  e,  £  p,  or  that  k  =  I  and  t’o  —  ei  is  a  backward  edge  of  a  flow-carrying 
edge  Ui  — ►  i’o  €  Es-  In  the  first  case  lo(f,(l)  €  Z,  since  q  ha.s  to  be  a  shortest-path 
from  Uq  to  Vk  and  there  always  exists  an  edge  e  £  Es  such  that  WD{ef  <  L^(9)J- 

In  the  second  case  lD{f,<'l)  -  Z.  since  I'l  I’o  £  Es  implies  wd{vi  — uq)  €  Z  and 

^od,^)  =  —  ('o)  lyv  (lelinition.  Therefore,  lD{f,(l)  €  Z  for  every  q  and 

consequently  loif-P)^'^- 

Now,  we  can  easily  infer  that  the  solution  of  Problem  AUX  derived  according  to 
the  way  suggested  in  Lemma  5.9  is  a  solution  for  Problem  RMI- Dual- Flow. 

Theorem  5.10  Let  f  be  a  solution  for  the  primal  of  Pioblem  AUX  with  EQ{f)  C  Es- 
Let  X  be  a  solution  of  a  siiujlc-sourcc  shortcst-paths  problem  on  Goif)  from  a  source 
s  £  V/-  Then  x  is  a  solution  for  Problem  RMI-Dual-Flow. 

Proof:  From  Lemma  5.3  vve  infer  that  x  £  and  consequently  x  satisfies  con¬ 

straint  (5.6): 


.r(  v)  —  x(u)  <  w(e) 


5.4.  THE  ALGOiUTH.M 


59 


for  all  e  €  Therefore,  x  satisfies  constraint  (5.8)  of  Problem  RMI-Dual-Flow. 

From  Lemma  5.9  we  have  tliat  x  also  satisfies  the  integrality  constraint  (5.9)  of 
Problem  RMI-Dual-Flow.  Therefore,  x  £  which  implies  that  Z{y)  >  Z{x) 

for  every  y  ^  ^RMI-  But  from  Corollary  5.5  we  have  that  Z{y)  <  Z{x).  Therefore, 
X  €  'Tra//  ° 

5.4  The  Algorithm 

In  this  section  we  give  the  0(  I  ''  ig  V  )  algorithm  that  solves  Problem  RMI-Dual-Flow. 
Its  correctness  relies  on  the  tlieory  developed  in  the  previous  section. 

Algorithm  RMI-Dual-Flow  I’his  algorithm  determines  a  solution  x  for  Problem 
RMI-Dual-FLow. 

1.  Compute  the  edges  in  Es  by  solving  an  all-pairs  shortest-paths  problem  on  G. 
Fail  if  a  negative-weight  cycle  is  found. 

2.  Compute  a  miu-cost  (low  /  on  the  graph  Go- 

3.  Transform  /  into  /'  by  rerouting  flow  in  such  a  way  that  if  /'(e)  >  0  then 

u  V  e  Es. 

4.  Compute  the  shortest-paths  lengths  1(1;)  for  each  verte.x  v  in  Goif)  from  a 

source  s  G  Vf.  □ 

Step  1  requires  V  shortest-paths  algorithms.  The  total  cost  is  0{V{E  -|-  V\gV)) 
using  Johnson’s  all-pairs  shortest-paths  algorithm  [10].  Step  2  executes  one  uncapac¬ 
itated  min-cost  flow  algorithm,  which  requires  0{V\gV{ED  +  V’lgV’))  steps,  using 
Orlin’s  strongly  polynomial  algorithm  [19].  Step  3  runs  for  0(V E)  time,  since  each 
rerouting  eliminates  flow  from  at  least  one  edge  in  E  and  requires  0{V)  steps.  Step  4 
can  be  implemented  in  0[V Eq)  time,  using  Bellman-Ford’s  algorithm  for  shortest- 
paths.  Therefore,  the  overall  running  time  is  0(V^\gV). 

5.5  An  Application  to  State  Minimization 

In  this  section  we  present  the  state,  tainimizalion  problem  for  retiming  from  the  math¬ 
ematical  programming  persi)ecl  ive  described  in  [17],  and  we  give  a  reduction  of  the 
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problem  to  Problem  RMI-Dual-Flow.  The  state  minimization  problem  is  defined  as 
follows:  For  a  given  circuit  6'  =  (V',  E,(l,w),  determine  a  retiming  of  the  circuit  such 
that  the  total  number  of  registers  J2eeE  of  the  retimed  circuit  is  minimized. 

First,  we  give  without  proof  the  following  theorem  from  [17].  This  theorem  de¬ 
scribes  retiming  as  a  mi.xed-integer  programming  problem. 

Theorem  5.11  Let  G  =  (K,  E,d.w)  be  a  synchronous  circuit,  and  let  c  be  a  positive 
real  number.  There  exists  a  reliniincj  r  of  G  such  that  $(Gr)  <  c  if  and  only  if  there 
exists  an  assignment  of  a  rtol  ctihie  R(v)  and  an  integer  value  r[v)  to  each  vertex 
V  £  V  such  that 


R{r)-r(i:)  <  -d(v)lc, 

r{r)-R{v)  <  1, 

for  every  vertex  v  6  V’,  and 

r(r)-i{u)  <  w(e), 

R(r)-R(u)  <  w(e)-d(v)fc, 

wherever  u  —  v.  □ 


The  number  of  registers  S(Gr)  in  the  retimed  circuit  Gr  is 


S{Gr)  = 

E 

=  t )  +  ft  (t )  -  r(  u ) ) 

u  —  r 

=  S(G)  +  ^  »■(  t’)(outdegree(u)  -  indegree(u)), 

ret' 


where  S(G)  is  the  number  of  registers  in  the  original  circuit.  Since  5(G)  is  constant, 
minimizing  5(Gr)  is  equivalent  to  minimizing  the  quantity 

'y'  ;■(  e)(outclegree(  w)  —  indegree(j;)). 


5.5.  AN  APPLICATION  TO  STATE  MINIMIZATION 


61 


which  is  a  linear  combination  of  the  /•(  n),  since  (outdegree(t;)  —  indegree(u))  is  constant 
for  each  v.  Now,  using  Theorem  .5.11  we  can  state  the  register  minimization  problem 
in  its  mixed- integer  form: 

Problem  STMIN  (State  Minimization)  Given  asynchronous  circuit  G  =  (V,E,d,w) 
and  a  positive  number  c,  determine  a  retiming  r  of  G  such  that  $(Gr)  <  c  and 
Gr  has  the  minimum  number  of  registers.  Equivalently,  find  an  assignment  of  a 
real  value  R(v)  and  an  integer  value  r(y)  to  each  vertex  v  6  V  that  minimizes 
Jlvsv  r(!;)(outdegrce(  v)  -  iiulcgreol  r))  subject  to 


R{c)-r(v)  <  -d{v)lc,  (5.8) 

r(c).-  R(o)  <  1.  (5.9) 

for  every  vertex  v  6  V',  and 

rlr)-r(u)  <  iL’(e),  (5.10) 

R{i')-R(u)  <  w{e)  -  d[v)l c,  (5.11) 

wherever  u  o.  □ 


The  state  minimization  problem  on  G  =  {V,  E,d,w)  can  be  seen  from  the  per¬ 
spective  of  the  mi.xed-integer  problem  RMI-Dual-Flow  on  an  uncapacitated  network 
G'  =  (V'',  E\  w').  The  graph  G'  is  defined  as  follows. 

I  '  =  {r,  :  c£V,i=  1,2), 

E’  =  E[0  E'iO  EjU  E\, 

where 

E[  =  {i’l  — -  i’2  :  G 

E'i  —  {^’2  G  E'}, 

£'  =  {(/,  —  i-,  ;  u  — .  V  e  E}, 

K\  =  {U2  V2  '■  ^  V  £  E) . 
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The  edge-weight  of  each  edge 

e  £  E'  is 

-d(v)/c 

if 

^V2€  E{, 

w'(e)  =  < 

1 

if 

V2 

€  E!2, 

ie(e) 

if 

u 

-  it  6  E'^, 

.  w(e)  -  d{v)/c 

if 

u 

-  u  € 

The  unknown  r{v)  of  the  state  minimization  problem  corresponds  to  i(ui)  and  the 
unknown  R(v)  corresponds  to  a(i’2)-  The  function  b  is  defined  on  V  as  6(t;i)  = 
(indegree(i;)  -  outdegree(  e))  for  every  vertex  I’l  G  V,  and  6(^2)  =  0  for  every  vertex 
V2  £V.  Finally,  V  /  =  {t’l  ;  t  V'}. 

In  summary,  in  this  chapter  we  gave  a  solution  to  a  mixed-integer  optimization 
problem.  VV^e  identified  the  problem  a.s  the  restricted  mixed-integer  dual  of  an  unca¬ 
pacitated  minimum-cost  flow  by  observing  that  its  linear  programming  relaxation  is 
the  dual  of  au  ancapacitated  minimum-cost  flow  problem.  Based  on  this  observation 
we  developed  a  theoretical  framework  for  its  solution  and  we  gave  a  procedure  that 
solves  it  in  O(V^lgV)  steps.  Finally,  we  gave  an  application  of  our  algorithm  by  re¬ 
ducing  the  state  minimization  problem  for  retiming  to  the  mixed-integer  problem  that 
we  solved. 


Chapter  6 


Conclusion 


In  this  paper  we  have  investigated  properties  of  retiming,  a  synchronous  circuitry 
optimization  technique.  W’e  presented  specialized,  fast  algorithms  for  retiming  of 
combinational  circuitry.  Specifically,  we  showed  that  combinational  circuitry  can  be 
pipelined  with  minimum  latency  in  0{  E )  steps,  which  is  optimal  within  a  constant  fac¬ 
tor.  clock-period  niinirnizatiott  of  contbinational  circuitry  can  be  achieved  in  0{E\gD) 
steps,  where  D  is  the  ma.xinium  component  delay  in  the  circuit.  We  presented  a  novel 
and  concise  graph  theoretic  characterization  of  the  minimum  clock-period  of  a  circuit. 
Based  on  this  characterization  we  gave  improved  techniques  for  minimum  clock-period 
retiming  of  sequential  circuitry.  We  presented  an  0(min{V'*/^£'lg(V^Z?),  V"£})  algo¬ 
rithm  for  minimum  clockqreriod  retiming  of  unit-delay  circuitry,  and  an  0{V E\gD) 
algorithm  for  minimum  clock-period  retiming  of  general  circuitry.  We  also  showed  that 
a  retiming  of  a  general  circuit  with  clock-period  that  does  not  exceed  the  minimum 
by  more  than  D  can  be  found  in  0(  min{  lg(  KW)  lg(  KZ?),  V"£' lg( FZ))})  steps. 

Subsequently,  we  exhibited  the  closed  semiring  structure  of  retiming  and  we  gave  an 
algorithm  which  operates  based  on  this  structure.  Finally,  we  gave  an  0{V^\gV) 
time  algorithm  for  a  mixed-integer  optimization  problem,  which  arises  in  the  linear 
programming  framework  of  retiming. 

There  are  still  open  questions  of  both  practical  and  theoretical  interest  in  the  area. 
It  is  an  interesting  question  whether  there  e.xists  an  algorithm  for  minimum  clock- 
period  retiming  of  general  circuits  that  matches  the  running  time  of  the  algorithm  for 
the  same  problem  on  unit-delay  circuitry.  Decoupling  the  running  time  of  our  algo¬ 
rithm  for  the  mi.xed-integer  optimization  Problem  RMI-Dual-Flow  from  the  number  of 
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the  new  constraints  introduced  will  also  be  an  interesting  extension  of  the  techniques 
presented  in  this  thesis.  Finally,  proving  the  conjecture  that  Problem  MI-Dual-Flow 
is  intractable  wiU  fully  elucidate  the  problem  of  optimizing  mixed-integer  difference 
constraints.  Our  conjecture  is  supported  by  the  fact  that  the  feasible  vectors  of  Prob¬ 
lem  MI-Dual-Flow  do  not  form  a  convex  set  as  well  as  by  the  fact  that  the  solutions  to 
Problem  MI-Dual-Flow  do  not  necessarily  exhibit  the  optimal  substructure  property. 
Lack  of  conve.xity  and  optimal  substructure  rules  out  bnear  programming  and  dynamic 
programming  approaches,  that  could  load  to  polynomial-time  tdgorithms. 
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